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By Criype Hissonec 














In an attempt to overcome the weaknesses of the traditional school 
organization many progressive schools have developed new programs. 
These programs are so similar in character that collectively the 
changes have been referred to as the activity movement. This 
movement has claimed the center of the educational stage for a length 
of time sufficient to have engendered widespread interest in its out- 
comes and in its basic philosophy. 

In Doctor Hissong’s study an attempt has been made to discover 
the principles underlying the present activity movement, to determine 
the influence of traditional concepts in shaping the trends of the 
movement, and to see if in the light of the present knowledge of the 
child and his relation to his environment the movement rests upon a 
justifiable basis. 
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THE NATURE VERSUS NURTURE PROBLEM 








DEFINITION OF THE PROBLEM 
FRANK K. SHUTTLEWORTH 


Part I. 


Yale University 


This paper presents a more explicit definition of the nature versus 
nurture problem together with a catalogue of the conditions, limita- 
tions, and implications involved in the formulation and solution of the 
problem. The attempt has been made to make the discussion intel- 
ligible to the non-mathematical reader and statistical aspects of the 
problem have been placed in footnotes. A second paper presents 
a solution of the problem as applied to intelligence. This is necessarily 
statistical, but it is hoped that the concluding sections which develop 
certain implications of the solution will also be intelligible to the 
reader who has not been statistically trained. 


STATEMENT OF THE PROBLEM 


Viewed in the largest perspective the nature versus nurture problem 
poses the following question: What promises to be the most strategic 
method or methods of improving the health, intelligence, and general 
well-being of mankind over the next fifty years? Over the next 
thousand years? So far the most strategic point of attack has been 
the environmental factor: Raising the standard of living, pushing 
forward the frontiers of the medical and biological sciences, and 
providing better schools and a more stimulating intellectual environ- 
ment. Sometime in the near or far distant future the most strategic 
point of ‘attack will be the hereditary factor: Denying parenthood to 
those who are apt to pass on physical or mental disabilities. One of 
the grand problems for man’s control of his own destiny is the deter- 
mination of the changing emphases to be placed on the environmental 
or the hereditary attack from time to time in respect to each significant 
human variable. The exercise of this control presupposes the answers 
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to two types of questions. First, there are questions of fact, including 
hypothetical questions of fact. For example, how much could we 
increase the general level of health and the average IQ of the next 
generation of children by providing a quality of physical and medical 
care and of intellectual stimulation now enjoyed by only the most 
fortunate five per cent of the children of today? How much could 
we increase the general level of health and the average IQ of the next 
generation of children by denying parenthood to the one per cent or 
five per cent or ten per cent of those who are most apt to pass on 
physical or mental disabilities? Suppose parenthood were denied 
to all individuals failing to achieve a mental age of eight or ten or 
twelve years, how much of the improvement in the average IQ of the 
next generation should be attributed to the increase in native intellec- 
tual endowments, how much to the increase in the quality of home 
care and intellectual stimulation which is a by-product of denying 
parenthood to the less intelligent, and how much to the joint contribu- 
tion of better endowments and better care? 

Second, man’s control over his own destiny presupposes the answers 
to questions of policy and of ways and means. For example, what 
is the best way of raising the standard of living to the point where it 
will be possible to provide all children with the physical and medical 
care, the schooling, and the intellectual stimulation now enjoyed by 
only the most fortunate five per cent of the children of today? How 
long will this take? What is the best way of educating the public 
to the desirability (?) of segregating or sterilizing the one per cent or 
five per cent or ten per cent of the least fit? How long will this take? 
In the light of all the facts, what, for each significant human variable, 
are the most strategic emphases to be given from time to time to the 
hereditary and to the environmental attacks on the problem of improv- 
ing the life of man? Questions of this sort are also fundamental to 
the larger problem, but for the most part they fall outside the scope 


' of this and the subsequent paper. 


From a more immediate point of view the nature versus nurture 
problem poses the following question: What are the relative contribu- 
tions of hereditary differences and of environmental differences in 
accounting for individual differences in respect to each significant 
human variable? This question refers not only to status or cross- 
sectional variables such as height and intelligence, but also to develop- 
mental or longitudinal variables. What is the relative importance 
of hereditary versus environmental differences in determining indi- 
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vidual differences in the early or late onset of sexual maturation, 
in the complicated patterns of ossification and skeletal growth, in 
the development of susceptibilities or immunities to disease, in the 
integration and stabilization of physiological functions, and in the 
growth of mechanical skills, musical talents, and intellectual abilities? 
What difference would it make in the health or the IQ of a particular 
child if he were reared from birth to age ten in the poorest five per cent 
or in the best five per cent of the available environments? What 
are the chances that parents with above average intelligence will 
be favored with offspring having above average native intellectual 
endowments? Do children resemble their parents in intelligence 
mainly for the reason that superior parents pass on superior endow- 
ments or mainly for the reason that superior parents provide superior 
care? How good an indicator of native intelligence is an obtained 
Stanford Binet IQ? How much is the average level of native intellec- 
tual endowments lowered per generation by the fact that the profes- 
sional and so-called upper classes average fewer children per family 
than the laboring and so-called lower classes? How much is the 
general level of native physical vitality and resistance to disease 
lowered per generation by the fact that medical science enables many 
individuals to achieve maturity and parenthood who otherwise would 
have perished in infancy or. childhood? 

The foregoing questions delimit the scope of the discussion and 
indicate the essentials of the nature versus nurture problem. We 
turn now to a formulation of the problem which will point directly 
to the necessary data and methods of analysis and which will provide 
the basic quantitative results for the derivation of answers to these 
questions. From this point of view the nature versus nurture problem 
requires that we make a complete accounting for individual differences 
in terms of the proportion of these individual differences which should 
be attributed to hereditary differences and to environmental differ- 
ences. It is to be emphasized at this point that we are not concerned 
with the relative importance of heredity and of environment. The 
coordinate importance of both should be axiomatic. Rather, we are 
concerned in the long run with the relative emphases which should 
be given to hereditary and to environmental methods of improving 
the life of man. More immediately, we are concerned with the relative 
contributions of hereditary differences and of environmental differences 
in accounting for individual differences in respect to any particular 
variable. 
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This discussion leads to the following preliminary statement of 

1 the problem: What percentage of the variance, or of the individual 
| 4 differences, in respect to significant variables should be attributed 
HE to hereditary differences and what percentage should be attributed 
| a to environmental differences? The variance, or the square of the 
| standard deviation, is simply the most convenient statistical measure 
of the extent or range of individual differences. The variance is the 
most convenient measure because variances can be added and sub- 
| tracted whereas standard deviations and interquartile ranges can not. 
min) Purely for purposes of illustration, let us suppose in a population of 
/ foster children that the standard deviation of IQ’s is ten points indicat- 


ee 


| ing that sixty-eight per cent of the children have IQ’s falling between 
me | ninety and one hundred ten. Then, the variance is the square of 
he ‘| | ten IQ points or one hundred. Let us suppose that sixty-four per cent 
| | and thirty-six per cent of the IQ variance or of the individual dif- 
ie ferences are due to hereditary and to environmental differences 
j respectively. Then, the IQ variance due to hereditary differences 
is sixty-four and the standard deviation of IQ’s due to hereditary 
differences is the square root of sixty-four or eight IQ points; similarly, 
i the environmental IQ variance is thirty-six and the environmental 
| standard deviation is six IQ points. Now, under these conditions 
| the correlation corrected for errors of measurement between the IQ’s 
| of the foster children and a very complete battery of environmental 
in | measures should be .60 (equivalent to the environmental standard 
| deviation of six IQ points divided by the total standard deviation 
i of ten IQ points). Further, the correlation corrected for errors of 
| measurement between the I1Q’s of pairs of foster siblings (pairs of 
unrelated children adopted and reared together from an early age 
P| | in the same family) should be .36 (equivalent to the square of the 
correlation of .60 or equivalent to the environmental IQ variante of 
thirty-six divided by the total IQ variance of one hundred). 
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a Having explained the phrase ‘‘percentage of the variance” and 
ae | having suggested two methods of solving the problem for foster children, 
| the above statement needs to be somewhat amplified. For purposes 


i of convenient solution of the problem it is desirable to distinguish 
a) | two kinds of environmental differences. First, there are environ- 
mental differences such as family income, cultural status of home, and 
. | intelligence of parents which make the environment provided by one 
a family different from that provided by another. These are the factors 
which are commonly referred to as environmental, but in this paper 
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they will be more specifically labelled as inter-family environmental 
differences. Second, there are environmental differences operating 
within a family and which can not be measured by such factors as 
family income, cultural status, or parental intelligence. These are 
the factors which cause even identical twins reared in the same family 
to have slightly different environments and will be referred to as 
accidents and intra-family environmental differences. In addition it 
is necessary to take account of the joint contribution of hereditary 
differences and of inter-family environmental differences. This joint 
contribution is large or small depending on whether the correlation 
between the quality of native endowments and the quality of environ- _ 
ments is large or small. In the case of foster children this correlation 
is presumably zero, in the general population it is presumably positive. 
So far this aspect of the problem has been wholly neglected, but in 
the case of intelligence and many other variables it is crucial for the 
purpose of judging the promise of eugenic proposals. 

Again, a hypothetical numerical example is the best illustration. 
Let us suppose in the general population that the correlation is .50, 
i.e., that children with superior native endowments tend to be born 
into superior environments. Let us suppose, as in the case of the 
foster children, that the standard deviations of IQ’s due to hereditary 
and to environmental differences are eight and six IQ points respec- 
tively. Then, the total variance is the square of eight IQ points plus 
the square of six IQ points plus two times eight times six times the 
correlation of .50 giving a total variance of one hundred forty-eight 
instead of one hundred. Hence, the range of individual differences 
among children reared in the homes of their true parents should nor- 
mally be larger than among foster children. Under these conditions 
the variance or individual differences in intelligence should be attrib- 
uted 43.3 per cent to hereditary differences (4448), 24.3 per cent to 
environmental differences (36{43), and 32.4 per cent to the joint 
contribution of hereditary and of environmental differences (48443). 
The clue to the determination of the correlation between the quality 
of native endowments and the quality of environments and to the 
determination of the joint contribution of hereditary and of environ- 
mental differences lies in the fact that in this case the correlation 
between IQ’s and battery of environmental measures corrected for 
errors of measurement should be .82 instead of .60. 

Finally, it is desirable to eliminate from the problem such disturb- 
ing factors as errors of measurement and age and sex differences. 
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These distinctions lead to the following statement of the nature versus 
nurture problem: What percentage of the true variance or of the true 
individual differences in respect to such variables as height or intelli- 
gence in a clearly defined population of the same age and sex should 
be attributed to hereditary differences, to accidents and intra-family 
environmental differences, to inter-family environmental differences, 
and to the joint contribution of hereditary and of inter-family environ- 
mental differences?! Given a solution of this problem, then the 
quantitative findings together with supplementary data may be used 
to solve a wide range of general and special questions such as those 
suggested in the preceding pages. 


INADEQUATE FORMULATIONS OF THE PROBLEM 


It has already been emphasized that we are not concerned with the 
relative importance of heredity and of environment. The coordinate 
importance and limitations of both should be axiomatic. It must 
now be emphasized that the nature versus nurture problem can only 
be solved by the study of human materials. There is a perennial 
temptation to define and solve the human problem in terms of analogies 
drawn from experiences with flowers, fruit, and cattle and from 
experiments with rats, guinea pigs, and flies. Indeed, many com- 
petent biologists have urged that studies of human materials be 
abandoned and that all research be concentrated on lower forms of 
life. The importance of work with lower forms of life can not be 
denied. At the same time the significance of such studies for the 
nature versus nurture problem has been greatly exaggerated and 
attempts to solve the human problem by analogy have led to no end 
of confusion. 

Of the multitude of problems which dominate research on lower 
forms of life, three major types have been most frequently cited as 





1 The statistical statement of the problem is as follows. Let 21, z2,. . . tn be 
true deviations from the mean of a population of the same age and sex on any 
variable such as height or intelligence. The deviation of each individual’s score 
from the mean of the population may be expressed as 7; = hi +ai+e.°:° 
Zn = hn + Gn + €n in which h; - - - ha represent the amount of the z deviations 
to be attributed to hereditary differences, in which a, . . . a, represent the amount 
of the z deviations to be attributed to accidents and intra-family environmental 
differences, and in which e; - - - e, represent the amount of the z deviations to be 
attributed to inter-family environmental differences. By definition rag and rea are 
zero. Squaring, summing, and dividing by n gives 


oz" = on? + og? + oe? + 2Wreonce 
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significant for the nature versus nurture problem. First, there is the 
problem of developing new flowers, better plants and grains, more 
efficient hogs, poultry, and cattle which in the work of Luther Burbank 
achieved the level of the miraculous. The general procedure involves 
a highly selective breeding and the ruthless destruction of nine hundred 
ninety-nine out of a thousand specimens over many generations, a 
general procedure which must be radically modified as to details in 
working with any particular species. Second, there is the problem 
of understanding the mechanism by which a relatively undifferentiated 
bit of protoplasm becomes a highly organized and integrated organism, 
how, for example, a salamander egg becomes a salamander. Here the 
procedure is to bathe the eggs in nutritive and deficient cultures, to 
subject them to extremes of heat and cold, to burn them with acids, 
to apply intensive electrical and X-ray stimulation, and to transplant 
and graft parts of the developing organism from head to tail and back 
again and even to other species. Though less publicized the results 
are as spectacular as those of a Burbank. Given complete control 
of the environment and a salamander egg can be transformed into an 
adult organism which has no resemblance whatever to a normal 
salamander and, for that matter, no resemblance whatever to any 
other normal organism. Essentially all that these two types of studies 
demonstrate are the axiomatic propositions that both hereditary and 
environmental factors have enormous potentialities and that both 
are subject to obvious limitations, namely, that no amount of breeding 
can make wheat grow from thistles and that no extremes of environ- 
mental pressures can transform a salamander egg into an elephant. 
It is only in the most limited sense, as when we segregate one obviously 
defective individual out of a hundred thousand or feed thyroxin to 
cretins, that these approaches have any significance for the nature 
versus nurture problem. 

Third, there is the problem of the gene and of the mechanism of 
Mendelian heredity in which the hybrid (sic) fruit fly, Drosophilia 
melanogaster, is now the center of feverish study. The procedure 
here is to observe the presence or absence of certain characters or traits 
and the way in which these are linked or occur together. On the basis 
of inferences drawn from these observations there has been erected 
a most remarkable super-structure of closely integrated fact and 
theory Unlike the first and second type of approach this has very 
definite application to all human variables which are clearly dicho- 
tomous or categorical. Thus, traits such as color blindness versus 
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normal color vision, tasting versus non-tasting of phenylthiocarbamide, 
hemophilia versus normal coagulation of the blood, six fingers versus 
five, and the four blood groups are inherited and transmitted according 
to Mendelian laws. It is to be noticed that environmental influences 
are rarely involved in these categorical traits and, hence, that the 
hereditary nature of such traits is comparatively easy to determine, but 
even here the study of human materials is imperative. Again, how- 
ever, it is only in a limited sense, as when traits are dichotomous, 
categorical, present or absent, black or white, that this approach can 
contribute to the human problem. The mechanism of Mendelian 
heredity can contribute nothing as yet to the understanding of such 
variables as intelligence, musical ability, physical vitality, longevity, 
bodily size and growth, early or late sexual maturation, physiological 
functioning, etc., etc., where human differences lie on a continuous 
scale shading imperceptibly from white through many degrees of 
gray to black and where environmental differences quite obviously 
play some réle in the determination of individual differences. 

Finally, it should be noted that the two alleged advantages of 
research on lower forms of life are the very features which dilute the 
significance of their findings for the nature versus nurture problem. 
First, it is urged that research on lower forms of life permits an inten- 
sive attack on the hereditary factors through highly selective breeding 
and ruthless destruction of the unfit over many generations and an 
equally intensive attack on the environmental factors through the 
ability to vary a multitude of factors to known and extreme degrees. 
But, we are not interested in what might be accomplished with human 
beings by the application of extreme procedures which are so impos- 
sible as to be fantastic. We are interested to know what can be 
accomplished or should be attempted in the near or not too distant 
future by the segregation or sterilization of a very small fraction of the 
adult population. Would the results justify the costs and the multi- 
tude of inconveniences, interferences, and heartaches? We are not 
interested in the influence of wholly hypothetical, imaginary, and 
impossible environmental pressures on the IQ. We are interested 
in the influence of the existing environmental differences and of the 
available or prospective therapeutic procedures. 

Second, it is urged that research on lower forms of life permits a 
rigid, highly controlled, experimental attack: The environmental 
factors may be held constant while hereditary factors are manipulated, 
or, hereditary factors may be held constant while environmental 
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factors are manipulated. For many purposes, as in studies of the 
réle of magnesium or iron in an adequate diet, the importance of an 
experimental study with rats can not be overemphasized. But, 
the nature versus nurture problem involves much more than a deter- 
mination of the influence of hereditary differences when environmental 
differences are held constant or a determination of the influence of 
environmental differences when hereditary differences are held con- 
stant. The essence of the problem is the relative contributions of 
hereditary and of environmental differences. Further, in the case of 
intelligence and many other variables, it is essential that we have a 
determination of the joint contribution of hereditary and of environ- 
mental differences. 

Lest this discussion seem over critical of the significance for the 
nature versus nurture problem of work with lower forms of life, it should 
be recognized that the temptation to solve the human problem by 
analogy has been motivated by the difficulties and complexities of the 
problem and by the ambiguities and inadequacies of the vast majority 
of the studies of human materials. We turn now to the consideration 
of some of these difficulties. 


NECESSARY CONDITIONS AND LIMITATIONS 


This section consists of a catalogue of conditions which must be 
met, of the kinds of data which must be collected, of the assumptions 
which must be made, and of the limitations involved in a solution of 
the nature versus nurture problem. 

1. The human variables to be studied must be continuous, nor- 
mally distributed, and amenable to objective measurement to a known 
degree of reliability. 

2. The collection of certain special types of data is necessary. 
Data should be collected on pairs of foster siblings, that is, pairs of 
unrelated children who have been adopted and reared together from 
an early age in the same foster homes. The resemblance between the 
pairs of foster siblings indicates the contribution of inter-family 
environmental differences. Data should be collected on pairs of 
identical twins reared together in the homes of their true parents. 
The degree to which the resemblance between the pairs of identical 
twins falls short of unity indicates the contribution of accidents and 
of intra-family environmental differences. A battery of environ- 
mental measures is essential. Such batteries are available for the 


measurement of environmental differences which stimulate intelli- 
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gence, but we have yet to undertake the construction of comparable 
batteries for the measurement of environmental differences which are 
important for such factors as dental caries or nutritional status. 
The reliability of the several environmental measures and the adequacy 
of the battery as a whole should be checked by procedures to be 
outlined in a subsequent paper. Normally, the correlation between 
the environmental measures and the variable under investigation will 
be higher in the case of children reared in the homes of their true 
parents than in the case of children reared in foster homes and the 
difference between the two correlations may be used to determine the 
joint contribution of hereditary and of environmental differences. 
When these three factors have been accounted for, the remainder of the 
variance or of the individual difference is to be attributed to hereditary 
differences. 

3. Certain disturbing factors, as errors of measurement, sex 
differences, and the age or maturation factor, must be eliminated or 
held constant. It is axiomatic that, given a population of males 
reared from conception to age fifteen in average and strictly identical 
environments, then all of the individual differences must be attributed 
to hereditary differences and nothing but hereditary differences; 
conversely, when the hereditary factor is held constant all of the 
individual differences must be attributed to environmental differences 
and nothing but environmental differences. Hence, when children 
of the same age are studied, the maturation factor is not involved. 
The relation of the maturation factor to the nature versus nurture 
problem is to be studied by comparing the respective contributions 
of hereditary and of environmental differences to variance at age 
fifteen with their respective contributions at age five. 

4. It is important that the range or extent of the hereditary and 
of the environmental differences be clearly defined. This means 
that the problem must be formulated in terms of all the children or of a 
defined class of the children living in a defined area at a defined time. 
If the area is the state, of Iowa, the time 1935, and the population 
third generation native white, non-Hebrew ten years old boys of north 
European stock, then this formulation not only excludes from the 
problem the maturation factor and sex differences, but it also excludes 
a wide range of hereditary differences and of environmental differences. 
Specifically, such a formulation excludes essentially all racial and 
nationality differences, essentially all environmental differences 
associated with climate, and such extremely unfavorably environ- 
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mental conditions as are typical of canal boat children or of children 
reared in the mountains of eastern Tennessee. A solution of the 
problem for this situation cannot be expected to hold for China or 
California or even for a comparable population residing in Iowa in the 
year 2035. 

5. Linear relationships.! 

6. No assumption needs to be made concerning the nature of the 
interaction of hereditary and of environmental differences.’ 

7. In every case the size of the joint contribution of hereditary 
and of environmental differences or of the correlation between the 
quality of endowments and of environments must be measured.* 
In human populations it cannot be assumed that this factor is negli- 
gible. In the case of intelligence this aspect of the problem is crucial 
for the purpose of judging the possibilities of eugenic attempts to raise 
the average level of native intelligence. 

8. It is to be noted that this formulation excludes from con- 
sideration all mortality cases. Given a thousand male conceptions 
with different heredities reared in average and strictly identical 
environments and only five hundred cases surviving to age ten; then, 
all the differences among the five hundred survivors are to be attributed 
to hereditary differences, but, the converse proposition that all the 
hereditary differences are measured by the differences among the 
five hundred survivors is false. 





1 The relationships of h, a, and e to z must be linear. That is, a given change 
in h must produce the same change in z regardless of the absolute value of h and 
similarly fora and e. There is involved in this condition the difficult question of 
the equality of units of measurement and it follows that the extent to which the 
data meet this condition must be determined for each variable which is to be 
studied. Obviously, only the linearity of the relation of e to z can be tested 
empirically and the linearity of h to z must be assumed. 

2The statement that z: = hi + a: + e: implies nothing about the action, 
reaction, and interaction of h, a, and e throughout the lifetime of the individual, 
but merely states that, at the moment of measurement, the deviations due to A, 
a, and e are additive in accounting for xz. This follows from the third and fifth 
conditions listed above. 

8 This should be clear from the formula given in the first footnote on page 566. 
If o, and o, are equal and r,, is negative unity, then the variability of z is very 
small and oz equals og. If o, and o, are equal and r;, is positive unity, then the 
variability of z is very large and the contribution of the factor 2r,.cx10, approaches 
fifty per cent of the variance. It is apparent that if no measure of r;, is available 
it is incorrect to state conclusions in terms of oz? = o,? + o,? + o,? unless it is clear 
that the result is to be regarded as a very rough approximation. 
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9. Finally, it must be recognized that it is impossible to study the 
environmental differences which operate between conception and birth 
or between conception and the age at which the population of foster 
children is adopted. If the variable under investigation is the extent 
of dental caries among fourteen-year-old children, then the resemblance 
between pairs of foster siblings adopted on the average at six months 
of age should provide an excellent approximation to the contribution 
of environmental factors, but a similar attack on the problem of dental 
caries at age five would clearly underestimate the contribution of 
environmental factors. 

All of the foregoing conditions and limitations, with the exception 
of number six, which is merely clarifying, present difficulties. A 
complete catalogue of possible sources of error would provide a 
formidable list. Under these conditions it should be obvious that 
even the best solution is only an approximation, subject to a fairly 
large and indeterminate degree of error. 


IMPLICATIONS 


Strictly speaking, the definition of a problem is complete with a 
formulation of the specific questions which must be answered and with 
a catalogue of the conditions and limitations necessary to the formula- 
tion and solution of the problem. The nature versus nurture con- 
troversy, however, has developed so much confusion that no definition 
of the problem can be regarded as adequate until the implications and 
practical consequences of possible solutions are clear. Purely for 
purposes of illustration and in order to make these implications quite 
specific, the discussion which follows indulges in a large number of 
outright assumptions concerning possible solutions. In the catalogue 
which follows the first five points are in the nature of cautions, limita- 
tions, and negative implications. 

1. It must be emphasized that a solution of the nature versus 
nurture problem for any specific variable contributes nothing to an 
understanding of the genetic basis of the variable or to an under- 
standing of the hereditary mechanism involved. Assume that 
ninety per cent of the individual differences in stature are to be 
attributed to hereditary differences; then, it follows that existing 
differences in diet, physical care, etc., are relatively unimportant in 
determining the observed differences in stature and that the deter- 
mining factors are genetic. But this says nothing whatever about 
the nature of the genetic factors: Possibly there are genetic differences 
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for each specific bit of bone and cartilage which in combination 
produce stature; possibly there are genetic differences for bone and 
not for cartilage, possibly the genetic mechanism is glandular in whole 
or in part. Suppose that sixty per cent of the individual differences 
in spelling ability are to be attributed to hereditary differences. It 
would be fantastic to conclude that there are genes for spelling ability, 
more probably the individual differences in spelling ability are con- 
ditioned by individual differences in intelligence the precise genetic 
basis of which must also remain unknown. 

2. Similarly, solution of the nature versus nurture problem con- 
tributes nothing to an understanding of the environmental factors 
which may be involved. Assume that fifty per cent or seventy-five 
per cent or one hundred per cent of the individual differences in 
dental caries are to be attributed to environmental differences; then, 
there remains the problem of the nature of these environmental 
differences. Is the determining environmental factor bacillus acido- 
philous or deficient vitamin intake or deficient mineral intake or 
failure to utilize an otherwise adequate diet or lack of sunlight or soft 
foods or what? A solution of the nature versus nurture problem can 
only indicate whether environmental differences account for fifty per 
cent or seventy-five per cent or one hundred per cent of the differences 
among children in the severity of dental caries. 

3. Even a precise solution of the nature versus nurture problem 
for all significant human variables could not provide a royal road to 
the perfection of man. The problem of dental caries is a case in point. 
Available evidence indicates that not more than twenty per cent of 
the individual differences in intelligence are to be attributed to 
environmental differences, hence, it would seem to follow that segrega- 
tion or sterilization of the feeble-minded would solve the problem of 
raising the general level of intelligence. But, evidence presented 
in a second paper indicates that the selection of parents who are apt to 
pass on deficient native intellectual endowments is unexpectedly 
difficult. What difficulties may be encountered when other variables 
are subjected to intensive study cannot be anticipated. 

4. The relative contribution of hereditary differences to variance 
or to individual differences is not purely and simply a measure of the 
potency or of the determining influence of hereditary differences. 
Rather, it is very largely a measure of the extent or range of the 
hereditary differences existing in a given population. The concepts 
of potency and of determining influence appear to hold very well for 
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such factors as eye color or finger prints, but they do not hold for other 
variables. Assume that ninety per cent of the differences in stature 
are to be attributed to hereditary differences; assume an extreme 
eugenic attack segregating or sterilizing all adults who deviate by more 
than half an inch from the average stature for their sex; then, even- 
tually the individual differences in stature must become very small, 
the relative contribution of hereditary differences to the variance 
in stature must approximate zero, and the relative contribution of 
environmental differences to the restricted variance must be many 
times as great as formerly. It would be absurd to suggest in this 
situation that hereditary differences in respect to stature were less 
potent in the end than in the beginning. All that has happened is a 
change in the extent or range of the hereditary differences existing 
in the given population. This view suggests a very interesting 
corollary: The more successful the eugenic attack, the less the need for 
and the potentialities of a eugenic attack. It is irrelevant to the 
argument, but worth remarking, that many eugenists have a most 
naive faith in the environment of which they are wholly unconscious: 
All their evidence, arguments and propaganda are designed to change: 
the mores, the culture, and the laws on the statute books (the environ- 
ment) and the specific procedures of segregation and sterilization which 
they recommend represent the most extreme types of environmental 
pressures. 

5. The relative contribution of environmental differences to 
variance or to individual differences is not simply a measure of the 
potency or of the determining influence of environmental differences. 
Rather, it is very largely a measure of the extent or range of the 
environmental differences existing in or operating on a given popula- 
tion. The concept of potency and of determining influence when 
applied to environmental differences are wholly misleading. These 
concepts together with the accumulating evidence that only a small 
proportion of the differences in intelligence is to be attributed to 
environmental differences suggest the totally false conclusion that all 
educational procedures and all attempts to improve the environments 
are unimportant. Ability to score on an intelligence test is quite 
obviously a matter of schooling and of intellectual stimulation. But, 
if all children receive essentially the same intensive schooling and 
essentially the same high level of intellectual stimulation, then essen- 
tially all of the differences in intelligence must be attributed to differ- 
ences in heredity. One of the most effective methods of raising the 
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general intelligence of the population is through a leveling-up of 
environmental influences. In proportion as this leveling-up process is 
successful, in the same proportion will environmental differences 
disappear. Hence, the more successful the environmental attack, 
the smaller the contribution of environmental differences to individual 
differences in intelligence. Instead of developing an inferiority 
complex from the evidence that not more than twenty per cent of 
individual differences in intelligence are to be attributed to environ- 
mental differences, teachers, educators and social workers should 
make it their business to reduce the contribution of environmental 
differences to the vanishing point. 

It is to be emphasized that this point of view is a two-edged 
sword. Assume that eighty per cent of the differences in general 
health, physical vitality, and nutritional status among children are 
to be attributed to environmental differences. This might be inter- 
preted as evidence that environmental differences exert a determining 
influence on physical well-being, but it would be more correct to con- 
clude that the range or extent of the environmental differences which 
are important for health is very large and that some children are 
receiving excellent care whereas others are receiving grossly inadequate 
care. Instead of being gleeful to discover the potency of their tools, 
every nurse, dietician, physician, and public-health officer should be 
chagrined to learn that the vast majority of children are receiving a 
type of care which is far below the best that is available. 

6. Putting together these last two implications leads to a restate- 
ment of the whole problem. The problem is no longer the relative 
potency and determining influence of heredity and environment; 
instead, it is a problem of determining whether the differences in 
heredity existing in a given population are large or small relative to 
the range of the existing differences in the environment which are 
operating on that population. Assume that eighty per cent of the 
differences in the intelligence of American children are due to hered- 
itary differences and that twenty per cent are due to environmental 
differences; this means that the hereditary differences are relatively 
large (reflecting probably the polygot nature of our population) while 
the existing environmental differences are relatively small (reflecting 
doubtless the essential equality of the environment provided by 
universal public education). If investigation of the problem in another 
society should reverse these percentages, then the only meaningful 
conclusion would be that in such a society the range of the existing 
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hereditary differences is small relative to the range of the existing 
environmental differences. The relativity of the solution must be 
emphasized. At the end of an extreme eugenic attack designed to 
bring all adults to an ideal stature, the existing environmental differ- 
ences might be no greater absolutely than at the beginning; at the 
same time they might be relatively small at the beginning and rela- 
tively large at the end in relation to the existing hereditary differences. 
7. The next important implication concerns the correlation 
between the quality of endowments and of environments. Do 
children with superior native endowments tend to be born into superior 
environments? Do children with inferior native endowments tend 
to be born into inferior environments? The social significance of 
this question may be suggested by that fact that the eugenists who 
are alarmed at the high birth rates among the under-privileged and 
the low birth rates among the professional classes are assuming that 
the correlation between the quality of endowments and of environ- 
ments is high. The implications of this correlation can be most easily 
indicated by describing the nature of the societies in which the correla- 
tion would be high, zero, or negative. First, a high correlation in 
respect to intelligence and health would be expected in a society in 
which the shrewd and the strong take possession of the intellectual 
and material wealth and provide their children with both superior 
endowments and superior advantages while the morons and weak 
are left in poverty and ignorance to provide their children with both 
inferior endowments and inferior care. Insuch a society the individual 
differences in intelligence and health should be very large and a large 
proportion of these differences should be attributed to the fact that 
nature and nurture work together in one case to increase intelligence 
and health and in another case to decrease them. Ancient Greece 
or France before the Revolution would provide excellent examples, 
provided that it could be assumed that the ruling classes were natively 
superior and that the slaves and the rabble were natively inferior. 
Second, a low or essentially zero correlation between the quality 
of endowments and of environments would be expected in a society 
in which all children receive a high level of universal education and 
universal health service. In such a society differences in intelligence 
and health should be much smaller than in the first type of society, 
the contribution of this correlation and of environmental differences 
to the variance in intelligence and health should be comparatively 
small, the contribution of hereditary differences should be com- 
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paratively large, and the average level of health and intelligence 
given equal average native endowments should be very much higher 
than in the first type of society. Enormous strides have been made 
toward this type of society during the last century. 

Third, a negative correlation between endowments and environ- 
ments would be expected in a society in which children with inferior 
native endowments received every advantage while children with 
superior native endowments were compelled to shift for themselves. 
Given a one-to-one negative correlation and a nice balance between 
existing hereditary differences and existing environmental differences, 
it would be possible to achieve a dead level of mediocrity in the midst 
of hereditary and environmental differences. The tendency of the 
traditional school to ignore the gifted child and to concentrate on the 
dull normals constitutes a miniature reflection of the third type of 
society. 

8. The implications of solutions of the problem which give con- 
trasted values for the accidental and intra-family environmental 
differences also require comment. The available evidence indicates 
that the contribution of these factors to variance is considerably 
less than ten per cent for such variables as physical measures and 
intelligence, whereas in the cause of personality differences it may be 
as high as forty per cent or fifty per cent. In part the explanation of 
this contrast may be that in the case of physical measures and intelli- 
gence, identical twins reared in the same family react in the same way 
to similar physical environments and to similar intellectual environ- 
ments, whereas in the case of personality development twins do not 
react to similar environments but each must react to the other. In 
part also they suggest that hereditary differences contribute very 
little to personality differences. 

9. The nature versus nurture problem has been defined in terms 
of the conditions prevailing in a given society of human beings at a 
given time. Such a realistic formulation is the only one which can 
suggest methods of social control. It is possible that eighty per cent 
of the differences among children in general health, physical vitality, 
and nutritional status are to be attributed to environmental differences, 
while only twenty per cent are to be attributed to hereditary differ- 
ences. Assume this to be the situation, then it follows (a) that there 
exist very large differences in the kind of physical care, the adequacy 
of diet, and the quality of medical attention received by children; 
(b) that the hereditary differences in respect to physical well-being 
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are relatively small in comparison with the large environmental 
differences; (c) that in general much can be done to improve the con- 
dition of children who are under par for the simple reason that the 
vast majority have been receiving inadequate care; and (d) that the 
next step in improving the average level of the health of the whole 
population should be through improvements in the environment. 
Assume an effective and vigorous environmental attack prosecuted 
over a period of years until a uniform and very high level of physical 
care for all children is attained, and assume that the hereditary factor 
is neglected, then at the end of this period it would follow (a) that the 
environmental differences would be very small or non-existent; (b) 
that very little ‘could be done to improve the condition of children 
who were under par since all are receiving and have been receiving a 
high level of excellent care; (c) that essentially all the differences 
in general health, vitality, and nutritional status should be attributed 
to hereditary differences; (d) that, in the initial years of such an 
effective program, there should be a very marked improvement in the 
average health and physical vitality of all children; (e) that the hered- 
itary differences might be considerably larger and the average level 
of inborn vitality considerably lower than when the experiment 
started, particularly if the environmental attack continued over a 
number of generations; and (f) that the next step in improving the 
average level of general health should be through improvements in 
the hereditary qualities of the population. 
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EFFICIENCY PLUS ECONOMY IN SCORING AN 
INTEREST TEST 


E. K. STRONG, JR., AND H. D. CARTER 


Stanford University 


Experience has shown that the vocational interest inventory can 
be rendered most effective by using scoring techniques in which items 
are weighted in proportion to their diagnostic value. In the past 
this has been accomplished by means of a formula devised by Kelley, 
and described in several publications.2* Recently Kelley* has 
published a new formula, undoubtedly superior to the old one, which 
he no longer recommends. Both the formulae result in scoring 
devices in which a wide range of item weights (e.g. 30 to —30) ora 
narrow range of item weights (e.g. 4 to —4) may be used. 

This study is intended to answer two questions which have been 
raised concerning the effect of variations in the scoring procedure. 
The first question is concerned with comparison of results obtained 
by the old formula with similar results obtained by the new and 
improved formula. The second question, which rests upon practical 
grounds, is concerned with the effect of reduction of the range of 
scoring weights. Persons working with tests such as the Strong 
Vocational Interest Blank will want to know the extent of the changes 
resulting from adoption of the new formula; they will also want to 
know whether reduction of the range of scoring weights can be accom- 
plished without serious loss in efficiency of the instrument. The 
present study is an attempt to answer these questions by applying 
three methods to the same body of data. 


THE FUNDAMENTAL TECHNIQUES 


The formulae and the procedures involved in developing scoring 
scales for the interest test have been described in several publica- 
tions'**4.5; hence only a brief account is needed here. For each 
test-item, the number of persons in two contrasted groups responding 
in a particular way, and the number responding differently, are 
recorded; the results are reduced to a semi-equalized four-fold table, 
such as Table I. From this semi-equalized four-fold table, the 
product-moment correlation coefficient, phi, can be computed, as a 
measure of the value of that item for differentiating between the par- 
ticular group and the general group. The appropriateness of this 
procedure has been discussed by Kelley.‘ All the items in the test 
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are treated in this way, and are assigned appropriate weights in a 
scoring key intended to measure the distinctive interests characteristic 
of the particular occupational group. 


TaBLE I.—ExXampLe oF SEMI-EQUALIZED Four-FoLD TasBLze, SHowina REsvtts 
FROM A SINGLE TEST-ITEM 














Alternative responses 
Percentage | Percentage 
liking not liking Total 
PT IED, fac cccncccnstacecscas 9 91 100 
cd ch weceueesaabeckweads 25 75 100 
DE ccc bs kwett anh erates ehaes pw bd 34 166 200 














According to the older method, the item-weight w is obtained 
by Formula 1, 


g 
Y= ee 0) 
in which ¢ is the coefficient of correlation from the four-fold table, 
and ¢ is the standard deviation of the item-variable. The value 
(1 — ¢’) is proportional to the square of the standard error of the 
coefficient phi. The item-weights so derived are used in measuring 
the interests of the particular occupational group. 
Formula 2, more recently published by Kelley‘ is an improvement, 
because it uses in the denominator a value proportional to the standard 
error of ¢/o instead of phi. 


| A 
W = ant — x (2) 


The meaning of the notation in Formula 2 is fully discussed by Kelley.‘ 
The formula itself (his Formula 11) is given here in order that the 
reader who consults his article may see exactly which formula is 
under discussion in the present paper. Note that the value 4N may 
be dropped, or replaced by a constant, without change in the efficiency 
of the formula. Likewise, in Formula 1, a constant multiplier may 
be used. 

The computation of scoring weights for many items would be 
laborious by either method, particularly the new method, if one were 
to do all the computing by ordinary means. However, Strong’ has 
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prepared charts or nomograms, which make it possible to secure the 
item weights with ease and rapidity. One merely reads the entry in 
the chart corresponding to the values shown by the data in the semi- 
equalized four-fold table. It is therefore a simple matter to prepare 
scoring keys using the weighting procedures devised by Kelley. The 
method has been found very convenient by Strong,* Bernreuter,! 
and others who have used it. It should be emphasized that adoption 
of the newer and more complicated formula need not increase the 
labor of determining the item weights. 

The range of item weights is another matter. In each of the 
formulae there are constants, which may be altered without changing 
the diagnostic significance of the weighting system. Changing the 
constants will change the numerical magnitude of the weights, and 
hence affect the absolute range of weights. Restriction of the range 
of weights may be accomplished by substitution of a proportional 
set of values; this amounts to use of broader categories, and hence 
fewer of them. No one would think of using decimals to preserve the 
perfect continuity and the finer steps in weights; it is evident that 
in the interests of economy in scoring one should adopt the smallest 
range of weights which may be used without loss in efficiency. There 
are persons who object to the whole procedure, feeling that weighting 
of items is unnecessary and laborious; however, those who have 
adopted the scoring devices and methods described by Strong* have 
found the scoring both convenient and rapid; moreover, the diagnostic 
efficiency of a test of given length is known to be greatly increased 
by use of the differential weighting procedure for the scoring of items. 

Formula 2 is intended to replace Formula 1. Since the former 
has been widely used, and is now to be discarded, it seems advisable to 
carry out a quantitative investigation of the differences resulting 
from the change. It is likely that this cannot be done in a blanket 
fashion for all different tests such as the Bernreuter and the Strong; 
studies should be carried out for each instrument dealing with par- 
ticular subject-matter. In the present study, an investigation has 
been made of the effects of changing to the new formula, and secondly 
of the effects of reducing the range of weights. 


THE DATA 


Strong Vocational Interest Blanks which had been filled out 
by men in each of five criterion groups furnish the data for this report. 
For each of the four hundred twenty items in the interest blank, 
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responses were tabulated separately for one hundred engineers, 
one hundred chemists, one hundred life insurance salesmen, one 
hundred lawyers, and one hundred ministers. The results of these 
tabulations are basic to all the analysis reported in this paper. 

Using these tabulations, three different sets of scoring stencils 
were prepared for each of the five types of occupational interest. The 
first set of stencils made use of Formula 1, and employed item-weights 
ranging from plus fifteen to minus fifteen. The second group of 
stencils was based upon Formula 2; here also item-weights ranging 
from plus fifteen to minus fifteen were used. The third set of scales 
also used Formula 2, but the range of item-weights was restricted 
to nine steps, from one to nine inclusive. The interest blanks of 
the five groups were scored by each of the three scales, for each of 
the five types of interest. In this way, comparative data concern- 
ing the diagnostic efficiency of the three procedures were secured. 


RESULTS 


Table II shows the results of comparison of different occupational 
groups by means of the three sets of scales. The critical ratios 
presented in that table furnish the comparisons needed for answering 
the questions raised in this report. Since the purpose of the test is 
differentiation between the occupational groups, the procedure 
here applied bears directly upon the efficiency of the three scoring 
techniques. Inspection of the values in Table II indicates first that 
Kelley’s new formula is very slightly superior to the old one, and second 
that reduction of the range of weights from thirty-one points to nine 
points produces a negligible loss in efficiency. 


DISCUSSION 


In considering the results of Table II, several facts must be kept 
in mind in order to preserve a correct view of the situation. In 
the first place, it must be apparent that slight differences are to be 
expected, rather than gross differences. This is true because the 
three scales depend upon fundamentally similar treatments of the 
same basic tabulations. The refinement which is present in the new 
Kelley formula would be more in evidence in results of a test nearer 
the borderline of effectiveness. The differences between the interest 
scores of occupational groups are so great that the earlier formula 


was adequate when applied to results of the Strong Vocational Interest 
Blank. 
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Scores of one hundred engineers on the three engineering scales 
have been intercorrelated, as an example, to indicate the extent of the 
agreement in correlational terms. The correlation is .987 between 
the scales based upon the old method and the new method with the 
larger range of weights. The correlation is .985 between the scales 
based upon the old method and the new method with restricted range 

TaBLeE IJ.—Criticat Ratios INDICATING THE COMPARATIVE DIAGNOSTIC 

EFFICIENCY OF THE STRONG VOCATIONAL INTEREST SCALES BasED 


UPON THREE DIFFERENT SCORING PROCEDURES. ONE 
HunpDRED INDIVIDUALS IN Eacu CRITERION GROUP 








Chemist | Chemist | Chemist |, “f° | Tle | rawyer 
ve. insurance | insurance 
a va. v8. v8. 
life lawyer | engineer os nea engineer 
insurance lawyer | engineer 
Engineer scale: 
1. Old method, 15 to —15....| 15.30 9.66 6.28 4.73 21.31 15.30 
2. New method, 15 to —15...} 15.80 10.36 6.03 4.31 21.68 15.82 
3. New method, 1 to9....... 15.20 10.27 6.03 4.09 21.18 15.91 
Chemist scale: 
1, Old method, 15 to —15....| 27.65 17.59 5.67 7.40 21.00 11.99 
2. New method, 15 to —15...| 28.57 17.99 5.54 6.88 21.78 12.60 
3. New method, 1 to9....... 27.65 18.19 6.07 5.88 21.12 12.75 
Life insurance scale: 
1. Old method, 15 to —15....| 27.35 9.47 3.22 14.92 23.74 6.50 
2. New method, 15 to —15...| 27.76 10.56 3.36 14.84 23.61 7.23 
3. New method, 1 to9....... 26.71 10.78 3.08 12.32 23.07 7.75 
Lawyer scale: 
1. Old method, 15 to —15.... 4.06 17.17 .18 12.91 4.20 17.20 
2. New method, 15 to —15... 4.59 17.07 .07 12.21 4.58 17.19 
3. New method, 1 to9....... 4.80 16.33 .08 11.36 4.84 16.59 
Minister scale: 
1. Old method, 15 to —15.... .10 1.88 3.66 1.80 3.79 5.57 
2. New method, 15 to —15... .47 1.91 3.42 1.53 4.11 5.38 
3. New method, 1 to9....... . 86 2.66 3.00 1.92 4.08 5.89 























of weights. The coefficient is .988 between the two scales based 
upon the new method, one with the larger range of weights and the 
other with the restricted range of weights. These correlations indi- 
cate that the scales developed from the same basic data, but differing 
in the scoring procedures under consideration, measure the same thing. 
Scales with such high intercorrelations cannot be expected to behave 
differently. 

Table II furnishes thirty comparisons to indicate the relative 
efficiency of the old formula as compared with the new formula. Of 
these, seventeen comparisons are in favor of the new formula, and 
thirteen favor the old formula. Likewise, a similar number of com- 
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parisons indicates the relative efficiency of the new scale with the 
larger range of weights and the new scale with the restricted range of 
weights. Of these, seventeen comparisons favor the larger range of 
weights, and thirteen favor the results with the restricted range. 
There is apparently no great difference in efficiency. 

The outstanding fact to be gleaned from Table II is that the 
critical ratios are all of the same order of magnitude when the three 
different scales are employed in differentiating between any given 
pair of occupational groups. In no instance does the variation in 
procedure indicate a great gain or loss; not once does an insignificant 
relationship become reliable when the new procedure is used. Not 
once does a large and reliable difference shift to an insignificant one. 
The results suggest that the three scales might be used interchangeably 
for all practical purposes. 

These findings might well be expected in view of some statistical 
considerations. Item responses are unreliable when taken alone, 
and the unreliability of item responses is largely eliminated from 
results based upon large aggregates of items. In a very short test, 
the gains resulting from use of the new formula might be very impor- 
tant; in a test as long as the Strong Vocational Interest Blank (four 
hundred twenty items), errors in individual item-weights resulting 
from slightly less refined weighting procedures become less important. 
The purpose of the test has been to differentiate between occupations; 
for this purpose, the test has been conspicuously successful, even when 
the less refined method has been used. Hence even though the new 
formula zs better, it need not necessarily result in large gains in this 
situation. The comparative excellence of the new formula is not 
challenged in a situation wherein the older formula was adequate. 


FURTHER CONSIDERATIONS 


The results reported here lead one naturally to the question whether 
still further short-cuts can be adopted without loss in efficiency. At 
least two earlier studies have contributed data bearing upon the 
question; these, combined with results in the present paper, suggest 
the answer. 

Strong® tried out the procedure of reducing all weights to the 
values of one, zero, and minus one. Using this method, he devised 
scales, which he called unit scales, for measuring the interests of law- 
yers, accountants, and architects. The correlations between these 
scales and the standard scales ranged from .87 to .985 and averaged 
.927. The reliability coefficients for the unit scales were slightly 
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higher than those for the standard weighted scales. However, the 
unit scales did not differentiate between occupational groups as well 
as the standard scales; in fact, the amount of overlapping between 
groups was markedly increased. Hence it must be concluded that 
such drastic reduction of weights is unsatisfactory, since it resulted 
in great loss of efficiency with respect to the primary purpose of the test. 

Strong and Green® considered several short-cuts. The method 
of scoring only likes, and ignoring the other two types of responses 
was found quite unsatisfactory. Similarly, scoring likes and dislikes 
but ignoring indifferences was unsatisfactory, for although the scores 
showed high correlations with those secured by the standard procedure, 
the differentiation of occupational groups was not as good. Strong 
and Green also tried reducing the weights to the range from zero to 
nine, as suggested by Birnberg and Rosenstein. Correlations of the 
resulting scores with those obtained by the standard method averaged 
.984. The exact test of efficiency in differentiating between groups 
was not applied, but their work did indicate that the technique sug- 
gested by Birnberg and Rosenstein would be satisfactory. This 
method had been recommended as a saving in labor of scoring by hand, 
but it was found to be more laborious than the standard scoring 
procedure, principally because the change to all positive weights 
shifted the position of most frequent weights to larger values. 

It must be quite apparent that the range of weights can be reduced 
up to a certain point without loss, and that further reduction is 
not advisable. In the present study, it has been shown that adoption 
of weights ranging from one to nine results in a negligible loss in 
efficiency. The fact that there was a slight (though negligible) loss 
suggests that further reduction would not be desirable. 

It might also be pointed out that further reduction would not 
significantly reduce labor in scoring. When the Hollerith technique 
is used, a single column of the card is sufficient when weights of one 
to nine are employed. When hand scoring is to be done, stencils 
can be made using weights from minus four to four, which are equiva- 
lent to the Hollerith weights from one to nine. When this is done, 
all the values to be tapped out are small, and experienced scorers find 
that the mental task of perception, rather than the motor task of 
tapping, is the factor limiting scoring speed. With this range of 
weights, seven or eight blanks can be scored by hand in an hour, 
using one scale. The important point is that increased speed of 


scoring can probably not be secured through further reduction of 
the range of weights. 
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SUMMARY AND CONCLUSIONS 


Using data from five hundred persons in five occupational groups, 
an empirical study has been carried out to measure the comparative 
efficiency of three different procedures for scoring an interest test. 
The results indicate the following conclusions: 

1. The fundamental aspect of the procedure is that of weighting 
each test-item in proportion to its diagnostic significance. The effects 
of changing from Formula 1 to Formula 2, and the effects of restricting 
the scoring weights to a nine-point range, are relatively slight. This 
is true for this test, probably partly because of the large number of 
items included. 

2. The new formula recently provided by Kelley‘ has proved 
slightly superior to the similar, older formula. The new formula 
should be adopted for use in devising scoring scales for the interest test. 

3. Reduction of the scoring weights to a range from one to nine 
points has not caused any appreciable loss in effectiveness. Since 
this reduction of range of weights leads to a considerable saving of 
labor in scoring, it should be adopted as standard procedure. 

4. Workers may feel confident that diagnostic results secured 
after adoption of the new formula with the restricted range of item- 


weights will be in agreement with similar results secured by the 
old method. 


REFERENCES 


1. Bernreuter, R. G.: The Evaluation of a Proposed New Method for Constructing 
Personality Trait Tests. Ph. D. Thesis, Stanford University, 1931. 

2. Cowdery, Karl M.: “Measurement of Professional Attitude.” Jour. of 
Personnel Research, Vol. V, 1926, pp. 131-141. 

3. Fryer, Douglas: The Measurement of Interests. Henry Holt & Company, New 
York, 1931, pp. xxxvi and 488. 

4. Kelley, T. L.: ‘‘The Scoring of Alternative Responses with Reference to Some 
Criterion.” Journal of Educational Psychology, Vol. XXV, 1934, pp. 504-510. 

5. Strong, E. K., Jr.: ‘‘ Procedure for Scoring an Interest Test.’”’ Psychol. Clinic, 
Vol. XIX, 1930, pp. 63-72. 

6. Strong, E. K., Jr.: Manual for the Vocational Interest Blank. Stanford Univer- 
sity Press, 1935. 

7. Strong, E. K., Jr.: Chart for the Computation of Weights for Interest Test Items. 
Stanford University: I, 1930; II, 1935. (Photostatic copy available from 

| the author.) 

8. Strong, E. K., Jr. and Grben, Helen J.: ‘‘Short Cuts to Scoring an Interest 
Test.” Jour. Applied Psychol., Vol. XVI, 1932, pp. 1-8. 








CAN APTITUDE FOR SPECIFIC MUSICAL 
INSTRUMENTS BE PREDICTED? 
CHARLES J. LAMP 
San Francisco Public Schools 
AND 
NOEL KEYS 


University of California 


The investigation of musical talent and the prognosis of success in 
this field have long been subjects of psychological inquiry.! To 
review the literature on the Seashore Measures of Musical Talent 
alone would exceed the limits of this article. The criterion of musical 
attainment employed in such studies, however, has usually been the 
ability to succeed in courses in music offered in public schools, colleges, 
or conservatories. In the majority of investigations no clear distinc- 
tion has been made between musical theory and performance, while few 
have taken the pains to study separately the determinants of success 
with instruments of widely differing types. For the most part, 
vocalists, pianists, violinists, saxophone players, bandmasters, and 
even potential composers have been indiscriminately pooled to con- 
stitute the ‘‘musicians”’ or ‘“‘students of music’? whose performance 
tests are used to predict. Is it surprising that findings have been 
conflicting and results of value for guidance disappointingly meager? 

In the absence of scientifically validated conclusions, music instr'c- 
tors have been forced to resort to a priori reasoning and uncontrolled 
observation. Thus it has been suggested that an accurate sense of 
time and rhythm, while indispensable to performance on percussion 
instruments, may be of little importance in a vocalist. The sense of 
pitch is of minor consequence in a pianist, but an “‘ear”’ sensitive to 
fine differences has long been regarded a sine qua non of the violinist. 
In the same way, teachers of instrumental music have proposed 
convenient rules of thumb in the dicta that long, slender fingers make 
for success on stringed instruments, and that the type of brass horn 
selected by a player should be based on the relation of lip thickness 
to size of mouthpiece involved. Authorities do not always agree, 
however, as to the proper significance of these physical characteristics. 
For example, two books widely circulated among school bandmasters 





1 See, for example, the two hundred fifty-eight titles reviewed by Mursell, J. L.: 


“The Psychology of Music.” Psychological Bulletin, Vol. XX1X, 1932, pp. 218- 
241. 
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emphasize that “irregular teeth militate against success with a cup- 
mouthpiece instrument (brass horn), but are no handicap on reed 
(wood-wind) instruments,”! whereas a third? states unequivocally 
that, ‘reed instrument players must have even lower teeth.”’ 

At best, such theories have been unsupported by experimental 
evidence. At their worst, they have served to perpetuate errors and 
enhance the waste characterizing much music instruction. Lacking 
adequate basis for guidance, many a student, after failing miserably 
in his attempt to attain proficiency on a particular instrument, has 
sadly renounced all musical aspirations, when success lay within his 
grasp on some more suitable medium. Others sink years of effort 
and large sums in.the pursuit of some form of musical expression in 
which they are foredoomed to mediocrity. At atime when depression 
has curtailed all educational expenditures, it is more than ever desirable 
that means be found of directing pupils instructed at public expense 
to those instruments on which their chances of success are brightest. 

The present article will recount an experimental approach to the 
determination of comparative aptitude for three different types of 
musical instrument, namely brass, woodwind, and string, through a 
system of controlled exposures or tryout instruction in each, followed 
by objective measures of attainment. On the basis of accomplishment 
under these conditions, a study will be made of the validity of certain 
physical and mental characteristics widely employed for the prognosis 
of success in music. 


FACTORS INVESTIGATED FOR PREDICTIVE VALUE 


The factors selected for investigation as to possible prognostic 
value were as follows: 


1. IQ on Terman group test of mental ability. 

2. Pitch discrimination, from Seashore Measures of Musical Talent. 

3. Tonal memory, from the same. 

4. Evenness of teeth. 

5. Length or slenderness of fingers. 

6. Thickness of lip in relation to diameter of mouthpiece for brass horn 
players. 





1 Wright, Z. Porter: What Instrument? Cleveland: H. W. White Co., 1927, 
p. 5. Cf. ‘Some little fellow with uneven front teeth . . . will probably never 
make the grade (on a cornet) . . . but might be a wonder on the clarinet,”’ from 
Making the High School Band. Cincinnati: Rudolph Wurlitzer Co., p. 5. 

* Maddy, J. E.: School Bands—How They May Be Developed. New York: 
National Bureau for the Advancement of Music, p. 10. 
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The Terman group test seemed most appropriate for the measure- 
ment of intelligence, both because of its special suitability for pupils 
of the ninth and tenth grades, and because marks earned in courses 
in musical performances have been found by at least one investigator! 
to correlate more highly with scores on the Terman test (.423) than 
with scores on the Seashore tests, taken either singly or as a whole 
(.312). 

Of the Seashore tests, those for pitch and tonal memory were chosen 
for consideration as the two having highest reliability and validity.? 

Evenness of teeth was gauged by means of a scale developed for 
this purpose by Dr. Lamp and Professor Francis W. Epley, specialist 
in orthodontics of the College of Dentistry, University of California. 
In the construction of this scale, three photographs of each of sixty-two 
subjects of the present experiment were used. These photographs, 
taken by a professional photographer at thirty-four inches, showed 
successively (1) a front view of teeth parted, (2) a front view of teeth 
contacted as in biting, and (3) a profile view with teeth closed. Ten 
scale values were distinguished on the basis of number of months of 
orthodontic treatment needed for correction, and other criteria. 

Finger slenderness was measured by the ratio of length of middle 
finger to its width at the first joint, using micrometer calipers for this 
purpose. Thickness of lips, likewise, was determined with micrometer 
calipers of the type used by dentists for mouth measurements. 


TRYOUT INSTRUCTION AND CRITERION OF ACHIEVEMENT 


A major difficulty in the derivation of aptitude tests of any type is 
the discovery of a satisfactory criterion of success. When, as in the 
present instance, the desire is to arrive at relative aptitude for different 
forms of musical expression, the difficulty is enhanced. Even a study 
of successful professional violinists, for example, would leave it an 
open question whether the same talents might not have brought equal 





1 Highsmith, J. A.: ‘Selecting Musical Talent.” Journal of Applied Psychol- 
ogy, Vol. XIII, 1929, pp. 486-493. 

2 Drake, A. M.: “‘The Validity and Reliability of Tests of Musical Talent.” 
Journal of Applied Psychology, Vol. XVII, 1933, pp. 447-458. 

Also More, G. V.: ‘‘Prognostic Testing in Music on the College Level.” 
Journal of Educational Research, Vol. XX VI, 1932, pp. 199-212. 

*For reproduction of this scale, with further account of its derivation, see 
Lamp, C. J., and Epley, F. W.: ‘‘The Relation of Teeth Evenness to Performance 
on Brass and Woodwind Musical Instruments.” Journal of the American Dental 
Association, Vol. XXII, July, 1935, pp, 1232-1236. 
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or greater success on some other instrument, had that been attempted 
instead. 
The method here employed to overcome this difficulty consisted in: 


1. The selection of a group of beginning high-school students without 
previous experience or training on brass, woodwind, or stringed instruments. 

2. The submission of these students to a succession of controlled exposures, 
or intensive instruction on each of the three types of instrument named. 

3. The administration at the close of instruction on each instrument of an 
objective test of performance, scaled on the basis of the total group, and so 
affording a direct measure of relative success. 


The instructional materials and achievement tests necessary for 
this technique of controlled exposures were developed experimentally 
by the senior author over a period of several semesters. Under the 
procedure adopted, the tryout period on each instrument was confined 
to exactly forty class-hours, distributed over eight weeks. Practice 
between classes was 3 prevented through retention by the experimenter 
of all instruments and the privately-printed scores which constituted 
the sole instructional materials. 

These materials consist of graded exercises especially devised to 
conform to the principles of sound basic to the particular type of 
instrument. Work on the brass horn, for example, begins with mouth- 
piece exercises in lipping, and in production of the harmonic series 
of open tones. Ail subsequent lessons are developed about these, 
rather than the familiar diatonic or chromatic scales. With woodwind 
instruments, the principle of the lengthening tube is made the basis 
of instruction; and with strings, the production of harmonic intervals 
located by ear. Concepts presented are confined to the minimum 
essential for the performance of simple music on the instrument in 
question. No time is consumed in teaching the names of symbols, 
but direct associations are formed between the sight of the symbol, 
image of the corresponding sound, and the means of producing the 
same. A note on the fourth line of the staff thus becomes for the 
violinist neither D nor re but, for the time being, A3; that is, ‘‘ A string, 
third finger.”’ For the player of clarinet, flute, or oboe, the same 
symbol is interpreted R3, or ‘‘all holes covered down to and including 
the third finger of the right hand”; and for the player of a treble-clef 
brass horn, 1, that is, “‘valve 1 depressed, in conjunction with the 
appropriate degree of lip contraction.’”’ Each new item of learning 
is immediately applied in the production of simple melodies; while 
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ensemble procedure is used from the outset, to control practice and 
forestall such habits as beating time with the foot. 

The test following each ‘‘exposure”’ is one of sight-reading perform- 
ance, constructed to sample accurately the instruction received on the 
instrument in question. Each such test is in five parts, of ten measures 
and twenty quarter notes each. The test proper is preceded by a 
practice exercise adequate to familiarize the subject with the rules of 
procedure, and each part by two unscored measures designed to insure 
correct starting pitch and finger placement. A time and an error score 
are obtained. Converted into standard scores and totaled, these 
constitute the criterion of success for the tryout in question. Despite 
the speed with which these end-tests can be administered—only five 
minutes per pupil per instrument—reliability and validity prove 
highly satisfactory. Table I shows the reliabilities as derived from 
retests given a small sampling of subjects on successive days, and 
validity coefficients for the entire experimental group, based on 
correlation between rank on test scores and rank by the pooled judg- 
ments of the other members of their respective training groups.' 


TaBLE I.—CoEFFICIENTS OF RELIABILITY AND VALIDITY FOR THE PoOST-EXPOSURE 
Tests ON Eacu Type or INSTRUMENT 








Instrument Reliability Validity 
ce creG uk ed os ae eWe ese eaeedeke ses .95 + .015 .87 + .016 
tt cet canst ebakeebanes eee es .96 + .012 .78 + .026 
i tase hts deh hhh deo eee when ees hee .97 + .009 .85 + .021 











THE RELATION OF PREDICTIVE FACTORS TO ACHIEVEMENT 


The subjects of the present experiment consisted of the members of 
four successive annual classes in a course in introduction to instru- 
mental music, taught by Mr. Lamp at Polytechnic High School, 
San Francisco. Disregarding those who were segregated because of 
having received previous instruction and those who failed to complete 
at least two of the three tryouts, there remained one hundred fifty-one 
pupils, constituting the experimental group. The great majority 
were members of the ninth grade. 

Before commencement of instruction, six sets of mental and 
physical measurements were made, as described above. Thereupon, 





1 For fuller description of the instructional materials and post-exposure tests, 
see Lamp, Chas. J.: The Determination of Aptitude for Specific Musical Instruments. 
Unpublished doctoral dissertation, University of California, 1933. 
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the three eight-week exposures were conducted, each followed by its 
particular test. For instruction in brass, horns were assigned in 
accordance with the size of mouthpiece with which each pupil suc- 
ceeded best when tried removed from the horn. For the work in 
woodwind, clarinets were used exclusively; and for the strings, violins. 
The last two instruments were selected as the most economical and 
representative of their kind, and those which would permit of later 
transfer to other instruments of the same type with minimum loss. 

Zero-order correlations of the several predictive factors with success 
on the three types of instrument are seen in Table II. The following 
findings may be noted: 

1. Each of the three ‘‘mental’’ measurements, namely, IQ and 
the Seashore measures of pitch discrimination and tonal memory, 
shows positive correlations with performance on each type of instru- 
ment studied. Of the nine coefficients, six are statistically significant 
(t.e., over four times the PE), but no one alone is high enough to be of 
much practical value for individual guidance. 

2. For all three of these ‘‘mental’? measurements, the highest 
correlations are with success on brass horns, although the test of 
brass-horn performance is no more reliable than those for other 
instruments (see Table I). 


TaBLE II.—Raw CorRELATIONS OF Eacu or FrveE MENTAL AND PHYSICAL 
MEASUREMENTS WITH Success ON THREE TypEs OF MusIcaL INSTRUMENT 
AS DETERMINED BY OBJECTIVE TEsTs FOLLOWING EXPERIMENTAL TRYOUTS 











Type of instrument 
Predictive factor N 
Brass horns | Woodwind String 
IQ on Terman group test........ 811; .33 + .067) .20 + .072| .20 + .072 
Oe 93 .49 + .054; .40 + .056 | .35 + .061 
Seashore tonal memory.......... 93 .45 + .056) .24 + .066| .28 + .065 
MUR IIIR, v cn ccccsnccesesen 50 |—.13 + .094) .14 + .094/| .17 + .093 
Teeth evenness................. 50 |—.01 + .095) .13 + .094) .16 + .093 

















1 Owing to accidental destruction of the intelligence records of one class, 1Q’s 
were unfortunately not available for all of the subjects. The numbers for particu- 
lar correlations are further reduced by the fact that many students completed 
only two of the three tryouts. 


3. Pitch discrimination, as measured by the Seashore test, so far 
from being of unique importance for violinists, appears even more 
essential in brass horn players (r = .49 versus .35). The difference 
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between these correlations, amounting to but 1.7 PE, is too small to 
be reliable, but the finding is suggestive. While it is true that a 
violinist must produce notes which are accurate relative to others 
sounded, the player of a brass horn, having no strings of fixed pitch 
as reference points, needs something resembling a sense of absolute 
pitch. This ability required to image a tone accurately before its 
production is doubtless also associated with the trait measured by 
Seashore under the name of ‘“‘tonal memory.’”’ Among the subjects 
of the present experiment, pitch discrimination and tonal memory 
correlated .41 + .05. 

4. Contrary to the conclusions of Highsmith, either of the Seashore 
measures studied is found to forecast musical performance more 
closely than do Terman group IQ’s. The maximum difference, 
however, is but twice its PE, and hence inconclusive. 

5. Neither of the physical traits listed reveals any significant 
relationship with success on any of these instruments. Length and 
slenderness of fingers, so far from figuring largely in success on violin, 
shows a correlation of only .17 + .09 in the case of these beginners. 
This is almost identical with the correlation for teeth-evenness, which 
no one would conceive to be of importance in a violinist. Similarly, 
evenness of teeth appears to have no significant bearing on either 
brass-horn or woodwind playing, despite the confident assertions in 
band and orchestra leaders’ manuals. 

6. Of the physical measurements studied, only lip thickness 
behaved at all as expected. This was found to correlate .28 + .088 
with diameter of mouthpiece favored by brass-horn players, whether 
degree of success on the end test was held constant or allowed to vary. 
How low is this relationship, however, may be illustrated by the 
fact that the boy having the thickest lips in the entire experimental 
group and the one having the thinnest both became French-horn 
players of marked ability. 


PREDICTIONS OBTAINED FROM A COMBINATION OF TESTS 


The multiple correlations obtained between the most favorable 
combination of the three ‘‘mental’’ measurements and achievement 
on the several instruments may be seen in Table III. Of the three 
types of instrument, it is clear that only with brass horns is the cor- 
relation sufficiently high to be of service in individual guidance. The 
regression equation for predicting success on brass horns is: 


Xi = A9X>» a 29X35 + 21X,4 tt. 22, 
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TaBLE III.—MvLTIPLE CORRELATIONS FOR THE PREDICTION OF SuccESsS on 
Various INSTRUMENTS FROM KNOWLEDGE OF PupPILs’ Scorges ON SEASHORE 
PitcH AND TonaL Memory, AND THEIR IQ’s BY THE TERMAN GROUP 


Test oF MENTAL ABILITY 
INSTRUMENTS ON Wuic#H Succgss 


Is PREDICTED Mo ttipLe R 
ls 86 Ge dk we dels Kae sd ae Cae eS 0 bea ones es .58 
Ob. Ewe ath els Shc wad 66 We wa eee eee eawe .42 
EE EERE re See SP ae eee Per ee ee .39 


in which Xe, X3, and X,4 represent, respectively, score on the Seashore 
measure of pitch, score on tonal memory, and Terman group IQ. 
X, is the predicted score on the test of brass-horn performance, so 
scaled that one hundred indicates average success for the experimental 
group, with 17.5 points the sigma of the group. Scores above one 
hundred may therefore be regarded as distinctly good, while one 
hundred twenty or better denotes a degree of success attained by 
only one in seven of the subjects of the present experiment. 


SPECIFIC APTITUDES VERSUS GENERAL TALENT 


The extent to which musical aptitude is dependent upon the type 
of instrument attempted is indicated by Table IV. The correlations 
obtained are all significant, and those involving woodwind fairly 
substantial. On the other hand, they are by no means so high that 
success or failure on one medium can safely be taken as a criterion of 
musical talent in general. The comparatively specific nature of the 
abilities may be judged from the fact that, of the subjects in the present 
experiment, not one scored in the highest quarter of the range on all 
three instruments, and only three fell consistently in the lowest quarter. 


TaBLE IV.—CoRRELATION BETWEEN Success ON INSTRUMENTS OF DIFFERENT 
TypEes AS DETERMINED AFTER CONTROLLED TRYOUTS 








Types of instrument compared N Correlation 
ee Ui aie) kn ae vee 74 .31 + .07 
SESE LAA APO PEP ET EC 66 .5r + .06 
i ee al 83 .57 + .05 











Had the study been extended to include percussion instruments as 
well, it is entirely possible that one or more of the three failures would 
have proved quite acceptable drummers. 

Finally, it will be noted that the correlations in Table IV are of 
the same general order as in Table III. Apparently achievement on 
one type of musical instrument can be predicted frém performance 
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on a different type little or no better than from knowledge of a pupil’s 
IQ and Seashore scores. 


SUMMARY 


A greatly simplified system of instruction has been developed 
whereby pupils can be given effective tryouts on three major types of 
musical instrument, namely brass horns, woodwind, and strings, with 
only forty periods of teaching in each. Brief tests devised to measure 
performance after each such eight weeks introduction prove to possess 
uncommon reliability and validity. San Francisco high schools are 
already making extensive use of this short-exposure technique for 
determining musical aptitudes. 

By administering these tryouts to one hundred fifty one ninth- 
grade pupils under controlled conditions, preceded by various mental 
and physical measurements, the following conclusions have been 
reached: 

1. Neither pitch nor tonal memory as gauged by Seashore tests 
affords an index of aptitude for brass, woodwind, or stringed instru- 
ments which is adequate for individual guidance. The prognosis 
obtainable from Terman group 1Q’s is poorer yet. 

2. Teeth evenness and length or slenderness of fingers show no 
significant or appreciable relationship with achievement on any type 
of instrument studied, though considered important by many instruc- 
tors and writers of music manuals. 

3. There appears to be some agreement between thickness of lips 
and diameter of mouthpiece of the brass horn on which an individual 
is most likely to succeed, but the correlation is extremely low (r = .28). 

4. A combination of scores on pitch discrimination, tonal memory, 
and the Terman group intelligence test is found to predict performance 
on brass horns sufficiently well (r = .58) to be of some assistance in 
guidance. 

5. No combination of the mental and physical measurements here 
obtained serves to forecast success on clarinet or violin with a correla- 
tion higher than .42. This is too low for practical use in individual 
prediction. 

6. Correlations between success on instruments of the different 
types studied range from .31 to only .57. Aptitude even for instru- 
mental music seems, therefore, sufficiently specialized that measures of 
musical talent should be validated as far as possible in terms of specific 
forms of expression rather than a hypothetical “‘ general musicality.” 
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AN EVALUATION OF SOME PROBLEMS IN THE 
PREDICTION OF ACHIEVEMENT AT THE 
COLLEGE LEVEL 


DANIEL D. FEDER 
Department of Psychology, State University of Iowa 


It is axiomatic that the goal of all scientific experimentation is the 
prediction of occurrence of subsequent related events. We can never 
reach, and, in fact, should never aspire to reach 100 per cent accuracy 
in educational prediction. Such an end is impossible of achievement 
because, among other reasons: (1) Our measuring instruments do not 
have perfect validity and reliability; (2) we have no way of controlling 
or directly evaluating the effects of motivation; (3) there are subtle 
personality factors characteristic of testees which preclude the possi- 
bility of achieving ‘‘true”’ measures at each tested performance. 
Furthermore, such a goal is undesirable because if it could be achieved, 


it would result in a completely static system of education. On this 
point Stoddard has said: 


Prediction which begins with ‘‘I think this will happen” and a semester or 
year later ends with “I told you so” is worth nothing; it has merely satisfied 
an intellectual curiosity. If no change has been made in the machinery for 
taking care of differently equipped students; if the instructor has paid no 
attention to these differences; if students are all forced through the same learn- 
ing processes at the same pace on the same content material, it is doubtful 
that the widespread use of prognostic tests can be justified.! 


Testifying to this all too frequent abuse, especially in higher education, 
is the large number of prediction studies which have appeared in the 
literature during the last ten years. 

There are two phases of prediction in educational guidance. The 
first is the evaluation of a student’s progress up to the moment of 
testing, and the prediction, in terms of such measurement, of how 
much farther he will progress in a stated interval of time. This is 
substantially the ‘‘I told you so” method. The second is the diag- 
nosis, by means of appropriate tests, of a particular student’s capabili- 
ties, difficulties, etc., and the attempt to guide his future study into 
those channels which will yield him the greatest returns for effort 
expended. This is prediction with a purpose—guidance. This is 
prediction of the sort Stoddard had in mind when he wrote: 





1 Stoddard, George D.: Iowa Placement Examinations. University of Iowa 
Studies in Education, Vol. III, No. 2, August 15, 1925. 
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Nevertheless good ends will be served by tests which will indicate what 
performance the student will show, assuming a constant educational environ- 
ment. If coefficients of correlation are then lowered (in the most advanced 
colleges) by the simple expedient of not waiting for determinism to proceed 
farther, this fact should be accepted by all concerned as a mark of progress.! 


In a recent experiment in the teaching of first-year French at the 
University of Iowa, it was discovered that under the experimental 
conditions nearly all the previous prediction coefficients had been 
lowered. The experimental instructional method consisted of organiz- 
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Foreign Language Aptitude test. 
Total Vocabulary, Semester I. 
Total Vocabulary, Semester II. 
Total Comprehension, Semester I. 
Total Comprehension, Semester II. 
Total Pronunciation, Semester I. 
Total Pronunciation, Semester IT. 
Total Composite Score, Semester I. 
Total Composite Score, Semester II. 


PEN ep 


ing the learning materials and processes so that each student might 
proceed at his own level of ability and at his best rate of speed. A 
minimum standard of achievement was established as the passing 
grade. Day-to-day mastery was insured by self-administered, 


self-scored daily tests on which 90 per cent accuracy was required. 





1Stoddard, George D.: The use of quantitative measurement in inducting the 
student into the institution of higher learning and in predicting his academic success. 
(In) Yearbook XVIII of the National Society of College Teachers of Education: 
Quantitative Measurement in Institutions of Higher Learning. Chicago: Univer- 
sity of Chicago Press, 1930, pp. ix, 253 (pp. 88-120). 
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The Foreign Language Aptitude test of the Iowa Placement 
Examination Series was found to have the best predictive value of any 
of the pre-instructional instruments employed. In Table I are given 
the correlations between this test and total scores in the various skills 
in French. These total scores were secured by combining the raw 
scores for each type of test. Since the items in each test were built 
on the same principles, such unweighted combination offered a simple 
and valid technique. Actually, it resulted in one long test, of graded 
difficulty, and of higher reliability than any one of the tests taken 
alone. . 

With only two exceptions, the prediction coefficients for the 
separate skills of the experimental group are lower than those of 
the control group. The differences are even more pronounced in the 
correlations between the aptitude test and the total semester perform- 
ance. Although none of these differences was found to have absolute 
statistical significance, the fact that these figures are based on matched 
groups lends reliability to the results. 

At first these drops in predictive efficiency were somewhat dis- 
concerting. One wondered if the system of instruction would operate 
to invalidate the pre-instructional information concerning the 
students’ ability and training. Inspection of the data revealed that 
the correlations for the control group included those students who 
received grades of D and Fd. These, by their low scores, extended 
the range considerably beyond that of the experimental group. This 
factor, coupled with the well-known fact that the prediction of failure 
is the most reliable of all, yielded a partial explanation. However, 
when the performance of those for whom failure or very low passing 
would have been predicted was examined, it was found that no more 
than their normal representation made up the group of “Failed” 
and “Incomplete” students in the experimental group. Thus it 
appears that the new method of instruction caused generally improved 
work, and resulted in decreased variability of the group. 

In contrast with the traditional situation which would have found 
most, if not all, of the students of low-language aptitude in the failure 
class, under the system of individualized instruction it became possible 
for them to produce presentable work. Therefore, the significantly 
improved performance of these poorer students, with its concomitant 
reduction of prediction coefficients, may be regarded with considerable 
satisfaction. 
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The writer has received requests for information concerning 
tests or batteries of tests, to be given at the beginning of the first year, 
which might be used in the prediction of achievement in the full four 
years of college. Such requests are typical of another fallacy in the 
use and interpretation of college entrance and qualifying examinations. 
These examinations serve in lieu only of reliable, comparable measures 
of previous achievement on the high-school level. Ideally, measure- 
ment of growth should be continuous throughout the entire educational 
career of each person, and comparable from school to school. In the 
absence of such measurements, the entrance and qualifying examina- 
tions usually provide a common basis for the survey of high-school 
achievement and college aptitude of entering students. 

The predictive efficiency of such instruments diminishes slightly 
after the first semester, and very markedly after the first year. Two 
instances are illustrated by Tables II, III, and IV. 

In Table II are presented some other results of the previously 
cited study. From these correlations it is apparent that first-semester 
achievement in any skill is far superior to the Foreign Language 
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Foreign Language Aptitude Test. 
Total Comprehension, Semester I. 
Total Comprehension, Semester IT. 
Total Vocabulary, Semester I. 
Total Vocabulary, Semester IT. 
Total Pronunciation, Semester I. 
Total Pronunciation, Semester IT. 
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Total Composite Score, Semester IT. 


SNOPES HPrS 


Aptitude test in the prediction of second-semester achievement. The 
use of the aptitude test yields coefficients of correlation between 
.50 and .60 with second-year work, whereas first-year achievement 
yields a coefficient of .70 or better with second-year work. 
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In Table III are data based on total grade point averages. The 
tests involved are the Iowa Qualifying Examinations, given to all 
freshmen at the beginning of their college career, and the 1932 Sopho- 
more examinations, administered to all sophomores in the Colleges of 
Liberal Arts and Engineering. The data are based on Liberal Arts 


TaBLE III].—ZERo-oRDER CORRELATIONS OF IowA QUALIFYING EXAMINATIONS, 
1932 SopHomMorRE TEstTs, AND GRADE Point AVERAGES 


























Variable 2 3 4 5 
1 .74 .57 51 .50 
2 .57 .58 51 N = 324 
3 .80 .67 
4 .75 
1. Iowa Qualifying Examinations. 
2. 1932 Sophomore Tests. 
3. First year Grade Point Average. 
4. Second year Grade Point Average. 
5. Third year, first semester Grade Point Averages. 


students only. Again the previously noted trends are apparent. 
Examinations may successfully predict first-year achievement, but 
thereafter the best prediction of future achievement can be made 


in terms of previous achievement. 


Despite the unreliability of college 
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marks in general, the combined grade point average seems to be quite 
stable throughout an individual’s total college career. 


This may be 


taken to indicate that the majority of students find their stride during 
their first year in college and tend to maintain it thereafter. 
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Further evidence that the prediction contributions of the various 
examinations is accounted for in the record of achievement is seen 
from the multiple correlations presented in Table IV. From these 
it will be seen that the addition of the weight of the examinations does 
not add significantly to the predictive correlations. 

A frequent error is the use of tests of general intelligence and 
achievement in the prediction of specific subject-matter performance. 
In the original statement of the purposes of placement examinations 
Dean C. E. Seashore emphasized the necessary specificity of such 
examinations.' Stoddard subsequently showed this specificity to be 
desirable not only theoretically but also in terms of actual tested 
performance.? 

Batteries combining general and specific tests yield but little 
better prediction and serve chiefly to cloud the issue. If the predictive 
examination is to serve for diagnosis and placement with reference to a 
specific course it must be constructed with a definite regard for these 
purposes. These ends can not be served by a test of general intelli- 
gence or a survey of general achievement. 


SUMMARY 


The function of prediction in education is to facilitate guidance, 
not to achieve rigid determinism. It is desirable to secure the best 
predictions possible in order to make guidance accurate and meaning- 
ful. Definition of the objective is essential in order to secure the most 
fitting predictive instruments. 

In the prediction of achievement, especially at the college level, 
it was found that: 

1. Reasonably high predictive coefficients should logically be 
expected to decrease under the influence of improved instructional 
methods and guidance. 

2. The best basis for prediction is the student’s previous record of 
achievement. In place of such objective cumulative records, pre- 
instructional tests may be used with profit. Prediction of post- 
freshman year achievement from entrance or qualifying examinations 





1Seashore, C. E.: ‘‘College Placement Examinations.” School and Society, 
Vol. XX, 1924, pp. 575-578. 

2 Stoddard, George D.: Iowa Placement Examinations, University of Iowa 
Studies in Education, Vol. III, No. 2, August 15, 1925. 
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constitutes an inefficient application of such tests; where instructional 
methods, objectives, and content vary in the later years, such usage 
is actually invalid. 

3. Prognostic tests, designed specifically to meet certain subject- 


matter requirements, have more power for such prediction than do 
tests of general ability. 


























TWO TESTS FOR PERSEVERANCE 


WALTER HOUSTON CLARK 
Master in English and Supervisor of Testing at Lenox School, Lenox, Massachusetts 


The following is the account of two tests that, under certain 
conditions, apparently give a good measure of the trait of perseverance. 
Not only do the tests promise to be a practical measure of this per- 
sonality trait, but the results of a recent study of them has interesting 
implications with respect to the theory that personality traits in 
general are specific with respect to different situations rather than 
consistent over a.number of varying situations, as will be pointed out. 

The findings are those arrived at in a study by the author! at 
Lenox School, a preparatory school for boys at Lenox, Massachusetts, 
in response to a need for some kind of character test to supplement 
intelligence tests and examinations in selecting candidates for entrance. 
It was felt that, outside of intelligence, perseverance was the most 
valuable trait making for success in college preparatory work. Con- 
sequently this trait was selected for investigation. 

In preliminary experiments the following diagnostic devices were 
tried and found wanting: A motor-inhibition test, a mathematical 
test consisting of a tedious amount of long-division, a ‘‘magic-number 
square”’ test, and a questionnaire.2 The two tests which showed 
positive results were first, a test which involved the building of as 
many words as possible from a given number of letters;* second, a test 
involving a similar process with numbers. In the latter the subjects 
were directed to take not more than six threes and to combine them 
by means of addition, subtraction, multiplication, or division in order 
to build as many numbers between one and one hundred as possible. 
Both of these tests were administered together, the word-building test 
being given first, but the subjects were given as long as they liked, and 





1 The author wishes to acknowledge the cooperation given him by the boys and 
Faculty of Lenox School, especially the Headmaster, the Reverend George Gardner 
Monks, who not or!y helped with the work but supplied many constructive ideas 
in devising and evaluating the tests. 

2 A full account of the investigation may be found in an unpublished master’s 
thesis submitted by the author in January, 1935, at the Harvard Graduate School 
of Education. . 

3 For a previous study of this technique see an article by Chapman, J. C.: 
‘Persistence, success, and speed in a mental task.””’ Pedagogical Seminary, 1924, 
pp. 276-284. 
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the perseverance score was the number of words or numbers correctly 
built with a correction for age. This correction could only be roughly 
approximated because of the small number of cases. It was arrived 
at by noting the average scores obtained at different ages and augment- 
ing the scores of the younger subjects in proportion to their youth. 

In the preliminary investigation the tests were given to the seventy- 
seven boys, ages twelve to nineteen, attending Lenox School. The 
boys knew that the testing situation was experimental, except that 
they were told that the results might possibly have some influence on 
their being recommended for admission to college. They were also 
asked to cooperate that the results might be as useful as possible. 
Here, of course, the control of motivation was weak, as the incentive 
varied according to whether the individual wished to be cooperative. 
The results of the testing were evaluated according to the extent of the 
agreement of the test scores with the pooled ratings on perseverance 
of five to eight masters who had lived with and taught the boys for 
some time. When these ratings were compared with each other, the 
average single master was found to have a reliability coefficient of 
r = .68, while the reliability of the pooled ratings, found by applying 
the Spearman-Brown formula, was .94. r between the Word-building 
test and the pooled masters’ ratings, based on 69 cases, was .21 + .08, 
while between the Number-building test and the ratings it was .45 + 
.07. When the scores on both tests were combined, the correlation 
with ratings was only .43 + .07, which was less than that when the 
Number-building test was used alone. 

These two tests were checked by administering them at another 
school.! Despite the fact that the administration was cut short the 
results were very similar. The Number-building test was again the 
best, correlating with masters’ ratings .44 + .08. The other correla- 
tions were not computed though inspection of distributions indicated 
that the results were proportionate to the results of the investigation 
at Lenox. The reliability of the average single master at South Kent 
was .76 while that of the pooled ratings was .93. 

These various correlations of test scores with ratings on persever- 
ance were high enough to indicate some validity for the tests and to 
warrant further investigation with them. The chief source of error 





1South Kent School, South Kent, Connecticut. The author is indebted to 
the school for their cooperation, especially to the Headmaster, Mr. Samuel 8. 
Bartlett, and Mr. Samuel A. Woodward for administering the tests. 
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seemed to be the variability in motivation—the fact that the testing 
situation was known by the subjects to be experimental rather than 
real. It was decided to try the tests again with the new boys entering 
the school in the fall when the desire to do their very best in order to 
make a good first impression had not yet worn off. The motivation 
would then be much more stable and the situation very similar to that 
confronting candidates for admission. 

Accordingly the two tests were administered, the day before the 
opening of school, to the twenty-four boys newly entering Lenox in 
September, 1934. The Number-building test was put first this time 
to see whether the order of the tests had anything to do with their 
relative efficiency. The Word-building test was also lengthened to 
include three lists of letters instead of one, and words only of four or 
more letters were admitted in the scoring. In December, after three 
months contact with the boys, the masters were asked to rate them on 
perseverance as before, but in two different ways. On one set of 
ratings they were instructed to rate the subjects only on perseverance 
observed in classroom work and the related study; the other set of 
ratings was to include perseverance in manual labor, athletics, and all 
other activities as well as studies. Since the school is one where each 
boy has some sort of manual job to perform to help in the running of 
the school, opportunity for rather broad observation existed. After the 
tests had been scored, rank correlations were computed between the 
ratings and the scores corrected for chronological age as before. On 
the Number-building test one subject did not follow directions, which 
left only twenty-three cases. The following gives the values of rho 
in tabular form: 











Classroom persever- | General perseverance 
ance ratings ratings 
Number-building.............. .76 + .06 .70 + .07 
Word-building................ .60 + .09 .49 + .10 
Combined tests!...........0.... .77 + .06 .70 + .07 








1The correlations with the combined tests were found with the multiple 
correlation formula Ri.23 = ~/1 — (1 — y%12)(1 — 7713.2) With rho substituted for r. 








1In this investigation chronological age was found to give a much more satis- 
factory correction than mental age. This suggests independence of the factors 
usually measured by intelligence tests. 





ll 
ch 
of 
he 
he 
)n 
ch 
ho 


oO 


ple 
rr. 


tis- 
ors 





Two Tests for Perseverance 607 


With proper allowance made for the small number of cases, several 
observations may be noted with respect to these results. (1) The 
correlations, especially for this type of personality test, are surprisingly 
high. As already noted, the average master, in ratings on persever- 
ance, tends to agree with a consensus of ratings to an extent represented 
by an r of .68 to .76.1_ Judged by the same criterion, then, the Number- 
building test, requiring about an hour’s time for its administration, 
gauged the perseverance of our twenty-three subjects as accurately, 
if not more so, as did the average observer who had based his rating 
on three months’ observation. (2) These results would suggest rather 
definite repudiation of the doctrine of the specificity of personality 
traits, especially when the correlations of the test scores with the 
ratings on general perseverance are considered. Even when one makes 
rather generous allowance for halo effect in ratings by observers whose 
primary emphasis is classroom work, there still remains support for 
this generalization, so far as such speculation may be hazarded, based 
on so few cases and only two tests. (3) The results in many respects 
confirm the results of the original testing program. For instance, 
the Number-building test is again the best, and the combined tests 
give a measure only slightly better than the Number-building test 
alone. Evidently Word-building is so similar in principle to Number- 
building that, despite the rather high correlation of the former with 
ratings, it is of little value even as a supplementary measure of persever- 
ance. In view of the fact that one of these tests involves linguistic 
and the other mathematical operations this appears to be another bit 
of evidence against the doctrine of the specificity of traits. However 
it may be noted that both tests involve mental operations of a con- 
structive nature. Consequently some new tests are being projected 
which may utilize more analytical as well as more mechanical activities 
of mind in the hope that a supplementary test will be found to increase 
the accuracy of the Number-building test prediction. (4) The order 
of giving the tests apparently had little or no effect on their relative 
efficiency. 

As already indicated the investigation of these tests is still in 
process. The writer intends to give them at Lenox School each fall for 
several years until enough cases can be collected, when the tests can be 
better evaluated and some norms established. The tests with instruc- 





1 Tt should be borne in mind that values of r and rho calculated from the same 
data are nearly the same. In general, however, rho tends to be slightly lower. 
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tions for administration follow. Should any experiment be tried 
with them the author would be interested in hearing the results. 


Directions for Administration of Tests——The administrator should be sure 
that all subjects have sharpened pencils and erasers. He should also have on 
hand scrap paper and additional pencils that may be needed. 

Before passing out the test papers he should say: 

‘We have here some tests on which everybody will be able to do a good 
deal of work because they require no special preparation or training. When 
the test papers are passed out, fill in the information called for at the top of the 
sheet and read the directions very carefully. When you get through with one 
sheet, raise your hand and I will give you another. You can pass in your 
papers wheneyer you are through, but first be absolutely sure that you have 
reached your limit.” . 

The Number-building test should be given first and this test collected 
before each individual is given the Word-building test. The subjects should 
be allowed to leave when they are through. 
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WoORD-BUILDING TEST 


Name Age yrs mos. Grade Date 








The important thing on this test is not to give up until you are sure you have 
done all you can. There is certain to be more to do than you think. There will 
be plenty of time but work steadily. RECORD THE TIME YOU BEGIN AND 
END WORK. 


TIME BEGUN DIRECTIONS: Out of the following lists of 
letters make as many words as you can without using proper names. Do not use 
any letter more times than it occurs in the list in any one word. Omit two-letter 
and three-letter words. Two examples are given in the first list. 


1EATRDN 





dent rate etc. 





2LAECKB 





3. OREMBUNO 





You may hand in your paper when you are through but BE SURE FIRST THERE 
ARE NO MORE YOU CAN GET. 


TIME ENDED 
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NUMBER-BUILDING TEST 
Name Age yrs mos. Grade_____ Date 


The important thing on this test is not to give up until you are sure you have 
done all you can. There is certain to be more to do than you think. There will 
be plenty of time but work steadily. RECORD THE TIME YOU BEGIN AND 
END WORK. 


TIME BEGUN DIRECTIONS: This test involves arithmetic. 
You are to take siz threes which may be combined in any way by means of addition, 
subtraction, multiplication, or division but by no other means. Find as many 
numbers as you can between one and one hundred, getting each number in only one 
way and using not more than siz threes. All the numbers between one and twelve 
can be found beside many others. Here is a beginning: 














1=3+3 , 30 59 87 
2= (3+ 3) +3 31 60 88 
3=3 32 61 89 
4=3 + (3 + 3) 33 62 90 
§=3+3-(8+3) 34 63 91 
6 35 64 92 
7 36 65 93 
8 37 66 94 
9 38 67 95 
10 39 68 96 
11 40 69 97 
12 41 70 98 
13 42 71 99 
14 43 72 100 
15 44 73 ARE YOU 
16 45 74 SURE YOU 
17 46 75 HAVE REACHED 
18 47 76 YOUR LIMIT? 
19 48 77 
20 49 78 TIME ENDED 
21 50 79 
22 51 80 
23 52 81 
24 53 82 
25 54 83 
26 55 84 
27 56 85 
28 57 86 


29 58 86 








ON THE CALCULATION OF THE CORRELATION 
BETWEEN A SINGLE ELEMENT OF A COMPOSITE 
AND THE REMAINDER OF THE COMPOSITE 


HERBERT 8. CONRAD* 
Institute of Child Welfare, University of California 


To overcome errors of measurement (both random and system- 
atic), it is customary, in psychology, to make an additive combination 
of results from a variety of measuring devices. This procedure tends 
not only to counterbalance the errors or biases of measurement pecu- 
liar to any one technique; it frequently serves also (through the 
avoidance of continued repetition of measurement by a single method) 
to promote better effort and cooperation by the persons submitting 
to the measurement-program. 

Assume that measurements X1, X2, X3, . . . , X, have been made. 
These measures may, perhaps, represent the n subtests of a battery; 
or they may represent a variety of separate measures of some trait, 
whether by tests, by instrumental techniques, by observational 
methods, by questionnaires, by rating scales, or by such inferential 
procedures as are employed in clinical appraisement or psychoanalysis. 
Assume, further, that the measurements have all been cast into 
numerical form, and combined by addition into a single composite or 
total score, 7. It is desired to know the correlation between X, and 
the remainder of the battery of measurements (7.e., rx,:r_x,)); between 
X2 and the remainder (7.e., rx,«r_x,)); ete. 

Correlations such as rxyr—x,, Tx,yr—x,, OT, In general, rx:r_x), 
may, of course, be computed directly, in the conventional way. 
Direct calculation, however, requires that subtractions of the type 
T — X be made for each person measured. When the number of 
cases is large enough to justify the use of correlation, this computation 
becomes a considerable task, especially if the number of separate 
X-variables is large. From the point of view of the reduction of 
statistical labor and expense, it would be desirable to use only the 
tabulated data for 7’, X1, X2, Xs, etc., without the additional calcula- 
tion of T — Xi, T — X2, T — Xs, etc. The most economical pro- 





* The writer is indebted to Dr. Harold E. Jones, Dr. Harold D. Carter, and 
Miss Ruth Krause for reading and criticism of the manuscript. Miss Krause 
very kindly computed Table I. 
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cedure, therefore, is to calculate rx,r,* rx,r,* etc.; and to obtain 
Tx. (7r—X,)) Tx,7r-x,), etc., from a formula which involves only X and T, 
and not T — X. Such a formula is easily derived as follows (in this 
derivation, capital letters refer, as usual, to raw scores; lower-case 
letters refer to deviations from the mean): 


za(t— 2) Zat— Zz 
Nowwt-z Now 
_ Nriow: — No.’ _ 20: —o2 


No.0i-: Ct—z 








Tx(T—x) = 





Hence 
T2tTt — Oz 
Vor + o;? — 2r 210 102 


From formula (1) it may be verified (as common-sense would sug- 
gest) that, other things being equal, the greater the magnitude of 





fx(7r—-x) = 





(1)f 


TaBLE I.—VALUES OF rx r_x) FOR GIVEN VALUES OF rz; AND o;/oz 











Value of o;/oz 
Tat 
2 5 10 20 

.00 — .45 — .20 —.10 — .05 
.20 — .29 .00 .10 15 
.40 —.1l .21 31 .36 
.60 .12 45 .53 .57 
.70 .27 .57 .64 .67 
.80 .45 | .76 .78 
.85 .55 .78 .82 . 84 
.90 .68 .85 .88 .89 
.95 .82 .92 .94 .94 
1.00* 1.00 1.00 1.00 1.00 

















* It is of some interest to note that for the special, hypothetical case where rz; 
equals 1.00, and o;/c,z also equals 1.00, the value of rx:r_x) is indeterminate (0/0). 
In this case (as shown by the formula for the standard deviation of the difference 
between correlated series), or_x equals zero; i.e., T-X is a constant. 





* If the intercorrelations between X;, X2, . . . , Xs are known, the values of 
1x,T, TX,T,; - . - » 7X,T may be computed from Spearman’s sum-formula (Kelley, 
T. L.: Statistical Method. New York: Macmillan, 1924, p. 198, formula 149.) 

+ The derivation of this formula (and of the succeeding formulas below) follows 
the lines of the derivations in Kelley, T. L.: Statistical Method. (Op. cit.), 
pp. 196-200. 





Calculation of the Correlation 613 


rzt, the greater the magnitude of rxip_x). The value of rxir_x) is 
dependent on the magnitude of r,, and of the ratio o:/o..* Table I, 
presented by way of illustration, shows quantitatively how rx:r_x) 
varies for different values of r.; and o:/cz. 

If the measurements X;, Xe, X3, ..., Xn are to be weighted 
(e.g., by the regression coefficients of a multiple regression equation), 


and correlations of the type rqox)r_-»x) are wanted, the derivation and 
formula are as follows: 


Z(wz)(t — wr) wr2zt — w*>z’? 
Now20 t—wz Now2F t—-wz 
T2etOt — WOz 








TiwX)(T-wxX) = 





Ct—wz 


ea TetTt — W0z 
Vo2 + ws,? — Quran: 


If the tabulations and statistical work have all been in terms of the 
variable wX, and not X, then oc, will not have been computed, but 
wz Will be known. To suit such a case, formula (2) may be re-written 
as follows (rz; of course equalling riz): 








(2) 


Ti(w2)Ft —~ Twr 
Vo + Cus" a 2r tw2)F Owe 





T(wX)(T—wxX) = 


(3) 





Occasionally an investigator may, for one reason or another, have 
used an average (A) of the n measurements, instead of a total composite 
score (7'). In such a case, if unweighted measures were employed, the 
only modification required of formula (1) is the substitution of nA 
for 7’, nou for o;, and of n’o,? for 0,7; rza of course equals rz; Thus: 


Nl 2aFTa —~ Oz 








Tx(na—-x) = 4 
x( 4 x) V/ 1042 + oc,” — 2NT 2a a x ( ) 

Similarly, formula (2) may be re-written: 
T(wX)(nd4d—wxX) = el nd (5) 








/ ne 42 + wo,” — Qnwr ar oF z 





* This may be shown algebraically by substituting, in formula (1), kez for o:, 
where k is a constant equal to o;/c,. After simplification, the resulting expression 
is a function exclusively of rz, and k. As shown in Table I above, the larger the 
value of k or o;/ez for a given value of rz:, the larger the magnitude of rx(r_x). For 
formulas (2-6) of this paper, the quantities comparable to rz: and o;/oz are, 
respectively, rzt, Tr(wz), Tza, Tza, Ta(w2), 6t/Wo2, 61/Cwz, Na/oz, NOa/Wos, ANd Noa/owz. 
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and formula (8) is re-written: 


NT aiw2)Fa —~ Swr 
V n'o,2 + Cw2? _ 2NT a(w2)F awe 





TwX)(nd—wX) = 





(6) 


When the number of subtests or sub-measurements, X1, Xe, X3, 
. . « ,» Xnis considerable (say, five or more), it will be found convenient 
to organize the calculation of the formulas in columnar form. Thus, 
the calculation of the numerator of formula (1) would be arranged as 
follows: 











— Numerator: 
X-variable Tot ot (2) x (3)] os a. - ie 
(1) (2) (3) (4) (5) (6) 
Subtest 1.......... 


Subtest 2.......... 




















_ Arranged in this way, the work of computation may be performed by 


a statistical clerk, and accurately checked. 
The use of the appropriate one of the formulas above offers an 


advantage not only in economy of statistical labor, but also in com- 


pleteness of information. Few investigators would undertake to 
calculate, by the direct method, both rxyr and rxir_x).* By the 
formulas above, the correlation between an element of a composite 
and the remainder of the composite, is obtained from the correlation 
between the element and the total composite. In other words, the 
investigator using any of these formulas ends with full knowledge of 
both rxr and rxir_x)*—at less cost than the direct calculation of 
Yx(r—x) alone. | 

Formulas (1-6) are so simply derived that any empirical verifica- 
tion may seem superfluous. The writer has, however, had occasion 
(in four instances) to verify that (within the limitations of errors of 


grouping) formula (1) gives results identical with those from direct 
calculation. , 





* Or both T(wX)T and T(wX)(T—wX); OT TXA and TX(nA—X); etc. 
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Since the correlation as calculated from formulas (1-6) is identical 
in magnitude with the correlation that would be obtained by direct 
computation, it is clear that the PE of r from any of these formulas is 
the same as the PE of the regular Pearson r. 

Formulas (1-6) are applicable only when the composite score is 
formed by the addition (or subtraction*) of the measures X;, X2, Xs, 
...+y, Xn. Multiplicative combinations (such as certain anthro- 
pometric indices of ‘‘body-build’’) do not in general lend themselves 
to any formulas as convenient as (1-6); this perhaps constitutes an 
argument in favor of the common preference, in psychological work, 
for additive composites. 

It is strongly recommended that the correlation between a subtest 
(or a single measurement) and the remainder of a composite of measure- 
ments be computed either directly, or by the application of the appro- 
priate one of formulas (1-6). The use of a simpler formula, applicable 
to certain artificial combinations, may involve assumptions which are 
insufficiently fulfilled by actual psychological data. 





* Formula (1) and formula (4) assume that the subtests are unweighted (or 
that the weight of each subtest is positive and equal). For subtests entering the 
composite by subtraction (as is likely particularly if the weights are taken from a 
multiple regression equation), the value of the weight, w, in formulas (2), (3), (5), 
and (6), is negative. 

t Musselman, J. R.: “Spurious Correlation Applied to Urn Schemata.” 
Jour. Amer. Stat. Ass., Vol. XVIII, Sept., 1923 (reference quoted from Garrett, 
H. E.: Statistics in Psychology and Education. New York: Longmans, 1926, 
p. 261). 
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EVALUATION OF SCORES OF HIGH-SCHOOL PUPILS 
ON DROBA-THURSTONE ATTITUDE-TOWARD- 
WAR SCALE 


ARTHUR E. TRAXLER 
University of Chicago 


The Thurstone Scales for the Measurement of Social Attitudes are 
generally considered to be among the more promising instruments 
for measurement in the field of attitudes. There is reason to think 
that they might be used successfully at the high-school level, but very 
few studies of their use in high school have been published. Two, only, 
have come to the attention of the writer. Peterson and Thurstone! 
used several of the scales in measuring the effect of motion pictures on 
the social attitudes of high-school pupils, and Longstreet? employed 
five of the scales in a study of the influence of social-science courses 
on the social attitudes of high-school pupils. The emphasis in both of 
these studies was upon change in social attitudes rather than upon an 
evaluation of the utility of the scales in measuring the attitudes of 
high-school pupils. 

The. purpose of this article is to report some data on the reliability 
and validity of scores made by high-school pupils on one of the 
Thurstone attitude scales—Attitude Toward War, Scale No. 2, 
Forms A and B, devised by D. D. Droba. More specifically, the 
article will report the correlations between Form A and Form B, the 
agreement between the two forms in the placing of pupils in categories 
with reference to attitude toward war, the median scores at the different 
grade levels, and the average difference between the highest and lowest 
scale values of the statements checked by each student. 


PROCEDURE IN GATHERING DATA 


Both forms of Scale No. 2, Attitude Toward War, were given to a 
sampling of the pupils in each of the five years of the University of 
Chicago High School. From twenty-one to twenty-six pupils were 


tested in each class. Form A was given first to all classes and was 
followed immediately by Form B. 





1 Peterson Ruth C., and Thurstone, L. L.: Motion Pictures and the Social Atti- 
tudes of Children. New York: The Macmillan Company, 1933. 


2 Longstreet, R. J.: “‘An Experiment with the Thurstone Attitude Scales.” 
School Review, Vol. XLITI, March, 1935, pp. 202-208. 
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RELIABILITY 


The scores on Form A were correlated with the scores on Form B 


for each class and for the whole group of pupils. The reliability 
coefficients are shown in Table I. 


TaBLE I.—CoEFFICIENTS OF RELIABILITY OF ScORES ON ATTITUDE-TOWARD-WAR 








ScaLE 
Class ~ Number r PE 
of cases 

oO a aks Wad Bik Ue a's ooo Wa 21 .706 + .077 
hs Vii e adds oalewan wae eae sk bb us eke 24 711 + .067 
EE I ey Oe i ee 24 .635 + .081 
EEE A RI ee eee ee ee 24 .806 + .048 
bait Gia s ue orn Ca aL: tae e eka 26 .705 + .068 
i a i ce 119 .705 + .031 











The lowest of the six correlations shown in Table I is .635 and the 
highest is .806. The other four correlations are approximately .70. 
The data indicate that one may anticipate a reliability of about .70 
when the scale is used with high-school pupils. The scale seems to be 
somewhat less reliable with high-school pupils than with college 
students. Droba' found a reliability of .83, using the scores of 
four hundred college students. 

A reliability of .70 is ordinarily considered rather low for an achieve- 
ment test or an intelligence test. It is high enough for a comparison 
of groups, but it is too low to be of much value in a study of individuals. 
However, it may be that one cannot expect to find as high relia- 
bility coefficients in testing attitudes as in testing intelligence and 
achievement. 

Each form of the scale consists of only twenty-two statements and 
can be administered within ten or fifteen minutes. Both forms could be 
administered consecutively in less than half an hour and the combined 
scores would have a reliability of about .82 (according to the 
Spearman-Brown formula). This is as high as the reliability of many 
widely used intelligence and achievement tests. 

There is another way of considering reliability that may be more 
meaningful when one is dealing with a scale of this kind. It is based 





1Droba, D. D.: ‘‘A Scale for Militarism-Pacifism.”” Journal of Educational 
Psychology, Vol. XXII, February, 1931, pp. 107-111. 
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on the use of a table which the author gives for interpreting the scores. 
The table is as follows: 


2.9—Extremely militaristic 
3.9—Strongly militaristic 
4.9—Mildly militaristic 
5.9—Neutral position 
6.9—Mildly pacifistic 
7.9—Strongly pacifistic 
1.0—Extremely pacifistic 


The difference between Form A and Form B in the placement of the 
pupils in the various categories of the table can be used as evidence 
about reliability.- These data are shown in Table II. 


TaBLE I].—DIFFERENCE BETWEEN Form A AND Form B In THE PLACEMENT OF 
THE PUPILS IN CATEGORIES! 














itll Number Per cent 
variation 
IB; I | If | II} IV} All| IB I II | III | IV | All 
0 ee 9; 8;10] 8] 13 | 48 | 42.9) 33.3) 41.7] 33.3) 50.0! 40.3 
One step..... 7|12/)11/]14] 7/4 51 | 33.3) 50.0) 45.8) 58.3) 26.9) 42.9 
Two steps...| 4); 3; 3; O|} 5] 15] 19.0) 12.5) 12.5) 0.0) 19.2) 12.6 
Three steps... 1] 1}; O}] 2] 1] 5] 4.8) 4.2) 0.0) 8.3) 3.9) 4.2 
Total...... 21 | 24 | 24 | 24 | 26 {119 |100.0/100.0/100.0/100.0/100.0)100.0 









































1JIn the table, IB means sub-Freshmen; I, Freshmen; II, Sophomores; III, 
Juniors; and IV, Seniors. 


The table shows that about forty per cent of the pupils were in the 
same category on both forms of the test and that forty-three per cent 
were changed only one step. A difference of one step in the table is 
not a very serious difference. Thus, in eighty-three per cent of the 
cases, the agreement between the two forms was quite close. In the 
case of the other seventeen per cent, the difference was two or three 
steps, which is so large a difference that the scale seems to have been 
of very little value for those pupils. 

No important differences between the upper classes and the lower 
ones were shown, although there was a tendency for the Seniors to 
show somewhat less change in categories than the pupils in the other 
classes. 
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VALIDITY 


It was found in the preceding section of this report that the median 
scores made by high-school pupils on the attitude-toward-war scale 
are fairly, although not highly, reliable. The next question is: Is the 
scale valid—that is, does it measure what it is intended to measure? 

Some evidence about the validity of the scale may be secured by 
noting the differences between the median scores of the lower classes 
and the higher ones. Social science has an important place in the 
program of studies of the University High School and courses in this 
field are required of each pupil throughout the five years. While no 
definite attempt is made to foster pacifism, the attitude toward war 
assumed in the courses is not a favorable one. It would seem, there- 
fore, that the pupils in the upper classes should be more pacifistic 
than pupils in the lower classes and that this trend toward pacifism 
should be reflected in the scores on the scale. Absence of a trend of 
this kind would not be indisputable evidence against the validity of the 
scale, since it is possible that opposition to war is not definite enough 
in the courses to influence pupil attitude, but presence of such a trend 
would be fair evidence for the validity of the scale. 

The medians of the scores made by the pupils at the different class 
levels are shown in Table III. 


TasB.Le III.—Mepian Scores ON ATTITUDE-TOWARD-WAR SCALE 








Group Form A Form B |Both forms 
nh 6.8 ee 14.5 
t+ idk Aka aa ween ae 6.9 7.5 14.4 
EEE RPE ee a ae 7.3 8.0 15.3 
Ak a deh we hea eee keds te 80 ows 7.8 7.2 14.3 
NG 6:4 eh deg whckaeeesddaw eee oud cet 7.1 7.8 14.9 

Rieti sik de de aes el ee hen ede en oe if 14.8 














There is littler difference between the median scores of any of the 
groups.’ The Sophomore medians are slightly the highest on both 
forms of the tests. The Sub-freshman and Freshman medians are 
practically as high as the Junior and Senior medians. As far as these 
data are indicative of the situation, there is not a progressively more 
pacifistic attitude with advancement in class-level. 

There are two reasons why this fact may not be unfavorable 
evidence about the validity of the scale. In the first place, nearly 
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all the medians fall within the category, “strongly pacifistic.”” It is 
obvious that if pupils are strongly pacifistic when they enter the 
University High School the social science courses and the other school 
influences will not greatly increase that pacifistic attitude unless they 
adopt an extremely pacifistic stand. 

In the second place, the regular work of the school, which is directed 
mainly at understanding, may not carry over to influence pupil 
attitude towardwar. Longstreet found that in schools where no definite 
attempt was made to alter attitude there was no change in the pupils’ 
attitudes toward war, as measured by the Thurstone scale, while in a 
school where teaching was directed toward altering attitude, the 
Thurstone scale -did show a change in attitude. One can interpret 
this finding to mean that if a change in attitude is actually brought 
about, it can be measured by the Thurstone scale. 

Therefore, the absence of change in score from class to class in the 
University High School probably should not be considered evidence 
against the validity of the scale, but it is significant that this criterion 
of validity was applied and that positive evidence of validity was 
lacking. 

There is a more crucial test of validity than the one which was just 
used. It seems reasonable to assume that if this is a valid test of 
attitude toward war there will be considerable internal consistency 
in the scale. In other words, each pupil should show the same general 
trend in all the statements he checks, although not all of the statements 
will have identical scale values. If all the statements with which a 
pupil agrees fall in the same part of the scale, one may be fairly sure 
that at least some aspects of his attitude toward war are sampled by 
the statements. If, on the other hand, the statements with which he 


says he is in agreement are scattered throughout the length of the 


scale, there is much reason to doubt that attitude toward war is being 
measured. In such a case, perhaps some other factors, such as 
vocabulary or reading ability, enter in to such an extent as to vitiate 
the measurement of attitude. It is, therefore, important to investigate 
the degree to which the high-school pupils were consistent in their 
checking of statements with which they agreed. 

The medians of the differences between the highest scale value and 


the lowest scale value of the statements checked by each pupil are 
shown in Table IV. 





1 Longstreet: Op. cit., p. 208. 
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Table IV shows that the average difference between the scale value 
of the most pacifistic statement and the most militaristic statement 
checked by each pupil is 7.3 points for Form A and 7.0 points for 
Form B. The greatest possible difference is 10.1 points for Form A 
and 10.2 points for Form B. Some of the statements that fall within 
the extremely militaristic category are not separated from some of 
those in the extremely pacifistic category by a difference of as much 
as seven points. It thus appears that the typical performance of the 


TaBLE IV.—MeEpDIANS OF DIFFERENCES BETWEEN HIGHEST SCALE VALUE AND 
Lowest ScaLE VALUE oF STATEMENTS CHECKED BY Eacu Poupiu 








Group Form A Form B 
RR a NR a 6.3 7.3 
EERE IR, Re AY ae 8 oe 9 A one 7.8 7.5 
CCL c i Cad eee COebe eee i S64 ene eK wes 6.1 6.8 
ils cedlice we eke MWe es Slee Wek eee cia dawoded 6.8 7.0 
Dit att ok Vee hethab heel eekbedéeceseuecns 7.5 6.8 

A A A SS hil le a ii at et 7.3 7.0 











pupils on both forms of the scale was not consistent. Although the 
pupils tended to agree with more pacifistic than militaristic statements, 
most of them checked some statements that were highly favorable 
toward war. 

Miller,' in a study of the Peterson-Thurstone War Attitude Scale 
found a similar tendency for the average student in the first two years 
of college to agree with statements which have widely divergent scale 
values. The mean range of the scale values of the items checked by 
each student was 7.2 in a possible range of 10.8 units. The findings 
in the present study for the Droba-Thurstone scale are approximately 
the same with high-school pupils. 

The pupils at the upper levels of the high school did not tend to be 
more consistent in their checking of the statements than did the pupils 
in the lower classes. The median range was as great in the Junior 
and Senior classes as it was in the Sub-Freshman class. At all levels 
of the school there was such a diversity in the statements with which 
individual pupils agreed that, as far as high-school pupils are concerned, 
a question may well be raised in regard to what it is that the scale 





1 Miller, L. W.: ‘‘A Critical Analysis of the Peterson-Thurstone War Attitude 
Scale.” Journal of Educational Psychology, Vol. XXV, December, 1934, pp. 662- 
668. 
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really measures. This spread in the value of the statements checked 
by the pupils is the most unfavorable evidence about the scale that 
was found in this study. 


SUMMARY 


The median scale values of the statements checked by high-school 
pupils in the Droba-Thurstone attitude-toward-war scale are fairly, 
but not highly, reliable. As to validity, there tends to be so much 
difference between the highest scale value and the lowest scale value 
of the statements checked by individual pupils that it is doubtful 
whether the scale really measures what it is intended to measure, when 
it_is used at the high-school level. 

No conclusion about the reliability or validity of the scores of 
high-school pupils on the other Thurstone scales may be inferred from 
a study of this one scale. It would be desirable to investigate the 


adaptability of some of the other scales to the high school in similar 
fashion. 
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SEX DIFFERENCES IN MENTAL GROWTH 


F. H. FINCH 


University of Minnesota 


Terman and his co-workers, in the third volume of their report of 
the investigation of gifted children,' set forth the results of retesting a 
superior group of children some six years after the original tests had 
been given. Widespread attention has been attracted by that phase 
of the report dealing with a sex difference in changes in intelligence 
quotient as determined by the Stanford Revision. Specifically, 
twenty-seven boys and twenty-seven girls, from what is described as 
the Regular group, show average decreases of three and seventeen 


points, respectively, in IQ. On the basis of this limited evidence the 
authors have generalized as follows: 


The data point with considerable force to the conclusion that changes in ability 
over a term of years in such a group as ours are due chiefly to ‘“‘change-of-rate”’ 
factors inherent in the individuals concerned, and that such factors are correlated 
with sex. Boys not only become increasingly more likely than girls to have a high 
IQ as they advance in age, but they are more likely than girls to retain a high IQ 
earlier evidenced (p. 62). 


In addition to the Regular group already mentioned, there are 
included in Table V (p. 25) data on eleven boys and eight girls from a 
group referred to as Outside cases. One of these boys, whom the 
authors believe to have been coached for the retest, showed a gain of 
forty-nine points in IQ (Table XVII, p. 44). When this case is 
eliminated, the remaining ten Outside boys appear to have decreased 
on the average 8.2 points in IQ. The Outside girls, on the other hand, 
make an average gain of 0.5. These Outside data, meagre as they are, 
raise considerable question concerning the conclusion just quoted. 
When the Outside cases (except the one boy who was coached) are 
included with the Regular cases, the totals become thirty-seven boys 
and thirty-five girls, and the difference of fourteen points in favor of 
boys falls to 8.5. It thus appears that the authors have attached 
undue significance to what may be a chance difference that might 
completely vanish with the inclusion of a few additional cases. 

An earlier study directed by Thorndike,’ and employing the 
I. E. R. tests, failed to reveal any conclusive evidence of a sex differ- 
ential in rate of gain. However, since the interval between the initial 
and final tests was in no case more than one year, it is entirely possible 
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that an existing difference might not have been discovered by this 
attack. 

Cattell? found in the Harvard Growth Study records of nine boys 
and eleven girls above one hundred twenty IQ who had been retested 
with the Stanford Revision after an average interval of six years. 
She reported an average gain in IQ of twelve points for boys and only 
seven points for girls. At lower intelligence levels she found very 
small differences favoring boys. 

Lincoln‘ has also presented from the Harvard Growth Study certain 
data collected on children above one hundred eighteen IQ through 
reexaminations with the Stanford Revision after an interval of five 
to eight years. He found the average loss of fifty-four girls to be 
six points greater than that of thirty-eight boys for whom he presented 
data. In a later report Lincoln’ included additional cases that 
increased his numbers to sixty-four girls and forty-five boys. For 
these groups the change in means favored the boys by 5.4 points, 
while the change in medians showed a difference of five points in the 
same direction. Both Cattell and Lincoln considered records from 
children tested at approximately the same ages that were represented 
in the Stanford data, where a maximum of thirteen years at retest 
prevailed. 

Parker® has reported the results of retesting children in the same 
general age range, but of subnormal intelligence. He also employed 
the Stanford Revision, using the abbreviated form. He believes that 
he has found evidence indicating a prepubertal spurt in mental growth 
occurring earlier for girls than for boys, and that girls reach their 
maximum earlier than boys. His results on three groups are presented 
by ages, but without including any statement of the number of cases 
in each. Curves plotted from his material reveal that one group 
clearly agrees with his conclusion, one shows a slight trend in that 
direction, and the third reverses this trend. 

McConnell’s® retest data were obtained through the use of the 
American Council Test on seventy college students. An interval of 
forty-three months elapsed between the original and final tests. 
The mean gain in raw score of the women exceeded that of the men 
by 11.85 + 6.84. The standard deviation of the original scores was 
forty-six; that of the final scores was forty-eight. No information 
is given as to the number of cases of each sex. 

The data presented below are for several reasons not directly com- 
parable to those upon which previous findings are based. In the first 
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place, both the age level and the interval between initial and final test 
are among factors that may influence the results, and in neither of 
these has the conditions of previous studies been duplicated. Fur- 
thermore, the test used consists of items differing materially from those 
in the Stanford Revision which Terman, Cattell, Lincoln, and Parker 
have employed. The Miller Mental Ability Test, Form A, was used 
in both initial and final testings. It should be pointed out that Miller’s 
test rates superior students somewhat higher than do other tests in 
common use.* Records were obtained by testing students at the 
time of their entrance to University High School, University of Minne- 
sota, and again during the winter prior to their graduation in June. 
Members of three graduating classes were included. 

Usable records were available on one hundred three girls and 
one hundred one boys. Age at the time of the first test, and interval 
between tests, expressed in months, were: 





Boys Girls 





M | SD | M | SD 





176. 


Age at first test (in months)................. 173.8 11. 4 
46.7 10.7 


Interval between tests (in months)............ 47.7 


a 
om 

















A comparison of changes in IQ for the two sexes based on the total 
group may be made from the following: 














Boys Girls 
NE oY eC N Sk ne peleeedausuae 101 103 
ate OG ad sam biped ds wg leegeke 137.6 + 1.45 132.3 + 1.29 
CN cha Sag wa wher ewe eebunar doe 136.4 + .99 137.4 + .98 
ti ae i se el ie ee 21.55 19.35 
tg dnb he ag Oba cake 15.35 14.76 
PED, ds 6s onus aaad-en' ones § —1.2 5.1 





The mean IQ for these boys dropped 1.2, while that of the girls 
increased 5.1, making the relative gain of the girls over the boys 6.3. 

When those cases falling below one hundred twenty IQ on the first 
test were excluded there remained seventy-nine boys and eighty girls. 
From the following figures a comparison similar to the above may be 
made for these restricted groups: 
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Boys Girls 
i hs Dee ee ke th date beuenee 79 80 
Gu AWA cineca ¥bees cee ia eean 144.9 + 1.38 139.6 + 1.10 
es cee ae a kd woe ie 141.0 + .92 141.9+ .89 
et. paces tual ese mae 18.19 14.57 
ss ak a handanesene aie 12.07 11.77 
chad cw eicn ake eee —3.9 2.3 











The change in mean IQ, again favoring the girls, is in this instance 


6.2. 


Further restriction of the data by excluding cases below one hundred 
thirty IQ on the original test reduced the numbers to sixty-two boys 
and fifty-one girls. For these cases the basis for comparison on IQ 


changes follows: 














Boys Girls 
EEE i a Oe ae ie ute diese owen 62 51 
a ihe Rhea a shy i& penis Woe bee Me 150.6 + 1.41 147.7 + 1.16 
i ance oie 54 ald eee eae 143.4 + 1.01 144.9 + 1.19 
a os ad ca ip ace leek aided 16.47 12.25 
A ae ds dw ews ae AS eR 86 Boe 11.62 12.62 
D6 chi dsscneesek eabecns —7.2 —2.8 





While both sexes are here lower in the second than in the first 
test, the change still favors the girls, the difference being 4.4. 
A more highly selected group was obtained by excluding those 





cases below one hundred forty IQ as determined by the original test. 
There now remained only forty-three boys and thirty-five girls. 
Computations for these limited data, similar to those made for the 
above groups, result in the following: 








Boys Girls 
Te a el ee al 43 35 
(ROE a ae CO ne a a RA a DN 157.4 + 1.58 154.1 + 1.06 
RS Po. weeds ubbee obese t 146.7 + 1.11 146.9 + 1.52 
CN eit hag es atts oe ete 15.38 9.32 
SE Ee ee ae eee 10.78 13.30 
i eee bt ia —10.7 —7.2 
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Thus in this small group including only cases of relatively high 
intelligence the change in mean IQ again favors the girls, the difference 
being 3.5. To recapitulate, the sex differences in relative gains are 
6.3, 6.2, 4.4, and 3.5, and in each instance they favor the girls. 

It will be observed that in those groups restricted to the higher 
intelligence levels there is a decrease in mean IQ from the first to the 
second test. This apparent loss is in part at least created by basing 
the restriction upon the results of the original test, and thereby 
introducing a regression in intelligence quotients from the final test. 
A second factor which also contributes to produce this effect is the 
inadequacy of the test for older subjects of extremely high ability. 
While the test is subject to this limitation in less degree than most tests 
designed for use with high-school students, it is not completely satis- 
factory for use with very superior cases at the upper age-levels. For 
example, the individual who has attained the age of sixteen is limited, 
even with a perfect score, to an IQ of one hundred sixty-seven, and 
probably is restricted to an important extent when his true ability 
approaches that level. There is in the available data, however, no 
indication that either of these factors is influencing the means of the 
two sexes in very different fashion. Instead, it is more probable that 
the means for both sexes are depressed to some extent below their 
true value by the second factor, but that the difference between the 
means is roughly the same as would be observed were an instrument 
entirely adequate from this standpoint available. 

It is, of course, impossible to establish from the data at hand that 
the observed differences in change in mean IQ represent real sex 
differences in rate of gain in general intelligence. It is well known that 
on certain types of test material boys will show performance superior 
to that of girls, while on other types of test items girls will show a higher 
average performance. Such a sex differential in test items is only one 
of the possible limiting factors of this and other studies dealing with sex 
differences.*5 Even though the limitations of the present study are 
such that it throws little light upon the question of sex difference in 
rate of gain in general intelligence during the period of adolescence, it 
nevertheless serves to raise some question regarding the widely quoted 
conclusions of certain earlier work, and to reemphasize the statement 
of Goodenough, who has written, ‘“‘the question of mental precocity 
as related to sex must be answered in terms of specific functions or 
traits, rather than in terms of unanalysed general tendencies’’‘ (p. 459). 
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A STUDY OF SOME PRACTICAL CONSIDERATIONS 
INVOLVED IN THE USE OF TWO EDUCATIONAL 
TEST BATTERIES 


HENRIETTE WOOLF AND CHRISTINE LIND 
Institute for Juvenile Research Chicago, Illinois 


The introduction of a new educational test battery gives rise to a 
question in the mind of an educational psychologist as to the advisi- 
bility of replacing the old battery with the new one. Before doing 
so it is advisable to make a comparative study of the two tests, both 
from the standpoint of test requirements and administrative differ- 
ences. This problem was the basis for a comparative study of the 
New Stanford Achievement Test and the Modern School Achievement 
Test after the appearance of the latter in 1932. 

Although the purpose of the study was to select the more valuable 
educational test in determining academic placement, the batteries 
were not studied by comparing the content of the individual tests. 
Both were prepared by able educators and subjected to rigid statistical 
evaluation. For this reason the administrative differences and the 
agreement with intelligence test ratings were studied. Previous 
investigators have reported a close relationship between educational 
achievement and mental measurements. In a survey of twelve such 
studies St. John! found correlations ranging from —.15 to +.91. The 
typical correlation between the intelligence quotient and the achieve- 
ment test score was +.56. 

One hundred boys and girls, all in their sixteenth year March 1, 
1932, were chosen from the consecutive admissions to the State School 
for Boys and the State School for Girls in Illinois between March 1, 
1932 and March 1, 1933. The reported educational background of 
the group ranged from the special room to the third year of high school. 
However, neither a school history nor a test record was available 
for the majority of the group, which necessitated careful educational 
and mental measurement for effective school placement. The tests 
were administered on two non-consecutive days, one group intelligence 
test and one achievement battery being given each day. The interval 
between test days was short so that the possibility of training was 
negligible. Each individual was also given a Stanford Binet Test. 





1§t. John, C. W.: “Educational Achievement in Relation to Intelligence.” 
Harvard Studies in Education, No. 15, 1930, pp. 219. 
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Since this study was for the purpose of selecting the more valuable 
educational test in determining academic placement it was thought 
advisable for statistical comparisons to use the measures of central 
tendency provided by the authors. In the New Stanford Achieve- 
ment Test this is the average of the individual tests, while for the 
Modern School Achievement Test it is the median of the ten tests. 
The chronological ages of the group were constant, therefore the 
intelligence quotients and educational quotients were used as a means 
of facilitating statistical computations. 

Comparisons of like individual tests in the two educational scales 
were made on the basis of raw scores since it is not possible to convert 
the scores of each section of the Modern Achievement Test into 
comparable ratings. In this battery the grade-range and age-range 
are not the same for each test, neither are the age-grade equivalents 
the same for all sections of the battery. In statistical evaluations 
involving final ratings it was not possible to use raw scores since the 
composite ratings are not obtained from actual scores either on the 
Modern School Achievement Test or on the Stanford Binet. 

The subject-matter and range of the New Stanford Achievement 
Test and of the Modern School Achievement Test are very similar. 
For that reason the correlation between the two tests of .89 + .015 
was not unexpected. The average difference in educational quotient 
was +4.33. Since the correlation is so high the differentiation 
between these tests is a problem of administration and relationship 
to mental tests. The following table compares the mean and sigma 
of the two educational tests and gives the correlation of each with 
the intelligence tests used. 


TaBLE I.—CoMPARISON OF INTELLIGENCE QUOTIENTS AND EDUCATIONAL 
QUOTIENTS 





Correlation with 








mean ER Sigma Stanford 


Binet Haggerty Otis 





hip anwies oan 81.55 + .74| 11.02 | .80 + .02|.89 + .01/.82 + .02 
ianennownnne 79.25 + .73| 10.78 |.74 + .03|.83 + .02/.80 + .02 























There seems to be very little difference between these two batteries 
in so far as correlation with individual and group mental tests is 
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concerned. However, the correlations vary from two to six points 
in favor of the New Stanford Achievement Test. These correlations 
indicate that both tests give equally consistent educational ratings 
compared to the achievement which might be expected from the 
mental levels of the students. 

The tests differ somewhat in administration and form, in the 
writers’ opinion each one has certain advantages. The typographical 
set-up of the New Stanford Achievement Test is excellent. The 
type face used is large and the questions are so arranged that only 
underlining is necessary in most of the tests. This facilitates reading 
speed and accuracy. In the Modern School Achievement Test it is 
necessary to copy the number of the answer for many of the tests. 
The sections of this latter test contain two to four pages each in 
irregular succession. In the New Stanford Achievement Test only 
the first and last test contain more than two pages, thus allowing the 
student to have an open booklet for most of the tests and eliminating 
the necessity of folding the booklet back and confusion with the 
following test. 

There is a slight difference in the selection of subject-matter 
in the two tests. The New Stanford Achievement Test includes 
tests of Word Meaning and of Literature. The Modern School 
Achievement Test includes a test of Speed and Accuracy in Reading 
and a test of Elementary Science. With the present emphasis on 
skill in reading it would seem more important to have separate meas- 
ures of speed and comprehension, than of comprehension of long and 
short units in reading as in the New Stanford Achievement Test. 

It is necessary to allow an examination period of three hours for 
the Modern School Achievement Test and a period of two hours and a 
half for the New Stanford Achievement Test. Seven parts of the 
latter test have uniform time limits and four of the Modern School 
Achievement Test are uniformly timed. The directions for giving 
the test are better standardized in the New Stanford Achievement 
Test and it is necessary to use the manual to give the test. For the 
Modern School Achievement Test the examiner must use both the 
manual and a test blank in giving directions. This test allows an 
examiner familiar with the ideas to give the test without a manual. 

In scoring the tests it is somewhat easier and quicker to score the 
Modern School Achievement Test since numbers are used for most 
of the answers and the answers are arranged in columnar form. The 
scoring of the New Stanford Achievement Test is somewhat of a 
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strain on the eyes. This is particularly true of the spelling test 
since it is often a test of the legibility of the writing as well. The 
spelling of only one word in each sentence in the Modern School 
Achievement Test is decidedly advantageous. In this battery the 
scoring is completely objective throughout, while in the New Stanford 
Achievement Test some leeway is permitted in scoring the test of 
Paragraph Meaning. 

The outstanding difference between the two batteries is in the 
treatment of the scores in individual tests and the composite rating. 
For the New Stanford Achievement Test, Advanced Examination, 
the results range from a grade level of 2.6 to 10.0, with corresponding 
ages given for each month of each grade. It is possible to obtain 
lower and higher scores than this, however, and for the higher scores 
age norms are available, while for the lower scores it is possible to 
use norms from the Primary Examination. For all practical purposes 
the range of age-grade equivalents is constant and a given score 
can be compared with another score or the composite score with equal 
facility. 

In the Modern School Achievement Test the lower grade limit 
is 2.0 for some tests, 3.0 for others, with the symbol L given for scores 
lower than those numerically stated. The highest grade limit given 
is 8.9 in some cases, in others 9.0 with the symbol H for a score above 
average, and V for a score exceeding that of the top twenty-five per 
cent of pupils at the end of eighth grade. The difference in the range 
of possible scores should be noted. On this test it is not possible 
to compare scores from one test with another since the age-grade 
equivalents are not kept constant from test to test. It is only 
possible to compare two grade-levels, or two age-levels. The final 
rating is the median of either the age or grade levels but one cannot be 
inferred from the other. Since no numerical rating is assigned at 
either the lower or the upper end of the scale it is not useful for testing 
dull children near the beginning of their school career, nor bright 
ones near the end of elementary school. 

A correlation was computed for similar tests in each battery. 
These data follow in Table II. In this table the names of the indi- 
vidual tests are given at the left with those of the New Stanford 
Achievement Test first. 

As would be expected'from the correlation between the two bat- 
teries as a whole, high correlations were obtained for most of the tests. 
The highest correlations were in spelling, reading, and arithmetic. 
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This was due probably to the rather definite standardization of the 


curriculum of these subjects. 


The mean age and grade were lower in 


five subjects for the Modern School Achievement Test, than for the 


This indicates apparently that 
the Modern School Achievement Test is the more difficult. 


New Stanford Achievement Test. 


TABLE II.—Comparison OF E1GHT Like INDIVIDUAL TEsts or ACHIEVEMENT 




















Corre- Mean , Mean | Subject 

lati Sigma 

ation score age grade 
Paragraph meaning......... .83 + .02| 76.9 17.30 12-2 6.3 
Reading comprehension.....|.......... 47.5 13.58 12-9 7.0 
Language usage............ .568 + .04) 71.9 22.52 11-8 5.8 
eS ee ee 29.2 8.31 11-4 5.6 
I se 5.44.04 60 aS .90 + .01| 75.9 17.06 12-0 6.2 
(“ss PPT TTETETICT.. Titer ere 51.4 16.26 12-0 6.2 
SS A re .66 + .04) 72.1 19.09 11-8 5.8 
A ES ee ree ere 22.6 9.21 12-0 6.1 
ree ere .76 + .03| 73.1 19.87 11-9 5.9 
sven na eke eee ena keans 26.1 8.95 11-6 5.7 
Physiology, hygiene........ .68 + .04| 82.9 15.82 12-10 ee 
Health knowledge..........].......... 31.1 9.90 12-0 6.2 
Arithmetic reasoning....... .80 + .02| 78.8 15.01 12-4 6.6 
Arithmetic reasoning.......|.......... 17.1 7.91 11-7 5.7 
Arithmetic computation.....|.76 + .03| 71.7 16.49 11-8 5.8 
Arithmetic computation.....|.......... 16.9 7.72 11-0 5.2 








Although the arrangement of the spelling test in the two batteries 
is so dissimilar the best correlation was obtained in this subject and 
the age-grade levels were exactly the same at the mean. In arithmetic 


it might be expected that the correlation would be lower since the 
time allowed for computation of sixty problems is thirty minutes in 
the New Stanford Achievement Test, and for thirty-five problems 
in the Modern School Achievement Test twenty minutes is allowed. 
The difference in the number of problems given and the range in the 
complexity of the problems would seem to be relatively unimportant. 
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Although the number of cases in this study is limited and the 
age-level is fixed, the writers believe that the data cover a sufficient 
range in educational and mental age so that the material is applicable 
to other groups. 

The results suggest the following conclusions, 

1. The New Stanford Achievement Tests shows a slight superiority 
in statistical evaluations compared to the Modern School Achievement 
Test. 

2. The New Stanford Achievement Test tends to give slightly 
higher age- and grade-levels according to the means obtained in a 
comparison of individual tests. 

3. Each of these batteries has administrative advantages. In the 
opinion of the ‘writers’ the New Stanford Achievement Test is pref- 
erable, primarily due to the conversion of all scores into comparable 
age-grade norms with a wider range of scores. 

4. Although these tests differ in certain administrative features, 
they are so similar in content and in correlation with other tests that 
the use of either is justified. 
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BOOK REVIEWS 


i a A Gurtrorp. Laboratory Studies in Psychology—a Manual and 
Workbook for Students. New York, Henry Holt and Co., 1934, 
p. 289. 


The author states in the preface—‘‘ The order of the studies may be 
altered to coordinate with almost any elementary text, but it was 
especially designed to integrate with a course which uses Woodworth’s 
Psychology as a text.” 

The forty studies listed comprise, for the most part, the stock 
experiments of the psychological laboratory. There are fewer 
examples of the ‘‘brass instrument” type than are usually found in 
manuals of this type. The study of the gross anatomy of the sheep’s 
brain seems to be new; at least the reviewer has not met it before. 
The manual is well designed, but some of the illustrations are poorly 
reproduced. The binding is of such a nature that the book cannot 
be opened flat; and this is a disadvantage. PETER SANDIFORD. 

University of Toronto. 


E. Greorce Payne. Readings in Educationc! Sociology, Volume II. 
New York: Prentice-Hall Inc., 1934, pp. [IX + 793. 


This is a companion volume to the one which the author had 
published in 1932. It is essentially a compilation of the significant 
materials emphasizing the application of sociology to educational 
practice. That the material which the author has selected is among 
the best of its kind, there is no doubt. The extensive list of acknowl- 
edgments to authors of known repute in their respective fields attests 
to this fact. 

The table of contents includes the following topics: ‘‘The School 
as a Social Agency,” “The Expanding Function of Education,” 
“Health Education,” ‘‘Civic Education,” ‘Character Education,” 
“ Adult Education,” ‘‘ Vocational Education and Guidance,” ‘Special 
Education,” ‘Creative and Progressive Education,” “The Curri- 
culum,” ‘The Sociological Method,” ‘Child Guidance,” ‘School 
Organization,” ‘‘Measuring the Results of Education,” and ‘‘Soci- 
ological Research in Education.’”’ This list would appear to be 
reasonably complete. The reviewer is not convinced that certain 
fundamental skills, appreciations, and habits other than social habits, 


can be relegated wholly to the background in the treatment of a 
635 
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balanced program of educational activities, even from the sociologists’ 
point of view. Certainly, there must be a few habits and skills which 
have personal value to the individual and which may be satisfactorily 
learned apart from the social situation for future use. Skills devel- 
oped in arithmetic, reading, and spelling are examples. This is not 
intended as a criticism of the volume, but rather, as a suggestion of a 
possible oversight on the part of teachers in thinking that all of the 
fundamental skills can be effectively taught incidentally in the class- 
room. The reviewer believes that educators, psychologists, and 
sociologists have contributions to make in the task of revising educa- 
tional practice. But no one group should be so presumptuous as 
to monopolize the entire project. There should be a more equitable 
distribution of the offerings of these groups of workers in this respect 
than in the past. RosBertT G. SImMpPson. 
Carnegie Institute of Technology. 


Grorce G. CAMPION AND GRAFTON E. SmitH. The Neural Basis of 
Thought. New York: Harcourt, Brace and Company, 1934, 
pp. VIII + 167. 


The appearance during recent years of many treatises seeking to 
identify neural structure and mental function reflects not only the 
interest but also the progress achieved in this field. The present 
book, although inspired partly by research on neural function, is 
largely concerned with hypotheses which must be either confirmed or 
rejected by future experimentation. 

The discussion leads up to and emphasizes the intimate integration 
of thalamic and cortical impulses in thinking. The authors’ hypothesis 
holds that the thalami, considered by some to be centers for the affec- 
tive aspect of sensation, also propagate streams of impulses to those 
‘‘neural schemata” involved in thought and receive streams of 
impulses from the cortex. Thus cortico-thalamic and thalamo- 
cortical impulses form integrated neural patterns during thinking. 
Although it is admitted that this is an hypothesis which may be con- 
firmed only by a ‘process of induction”’ extending through another 
generation, they have confidence that their view may ultimately 
result in a psychological law which will harmonize and integrate the 
various viewpoints or fields of psychology. Few readers will expect 
as much. 

Throughout the book the tendency to speculate rather than to 
depend upon induction is prominent. Although few will accept the 
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views here presented, many will be interested in this novel attempt 

at systematic organization of certain data in the field of physiological 

psychology. Mixes A. TINKER. 
University of Minnesota. 


Artour G. Britis. General Experimental Psychology. New York, 
Longmans, Green and Co., 1934, pp. X + 620. 


Textbooks of experimental psychology have always suffered from 
a lack of generality which has, in most cases, precluded their wide- 
spread adoption. This almost universal defect is a natural consequent 
of the aim in these books to outline experimental procedures as well 
as to present topical information. And since the number of experi- 
ments must have some limit, authors have been guided in their selec- 
tion by their own research interests and by the exigencies of apparatus 
limitations. Recently many instructors have adopted the practice 
of preparing manuals for their own laboratory classes—usually 
mimeographed booklets of directions with a minimum of informative 
content. Consequently the need has become more urgent for a general 
experimental text. Most readers will agree that Bills’ book is of the 
right type to fill this need. It is not designed to replace the laboratory 
manual, but to supplement it. It offers the student a ready access 
to the background and present status of the problems he investigates. 

But while approving of the book’s general plan and purpose, the 
critical reader may still deplore some of its aspects. In general, the 
selection and treatment of the material are perhaps too largely a 
reflection of the author’s own views and interests. Nearly two-thirds 
of the space is given over to topics of Learning and Memory, Associa- 
tion and Thought, and Work and Fatigue. And while the treatment 
here is both authoritative and complete, yet one can but regret the 
brevity and inadequacy of the sections on the sensory and perceptual 
processes. The author’s use of outline form here is not always effec- 
tive, for while it permits greater condensation, it does not enhance 
the clarity of comprehension—especially when the outline is reduced 
to the mere statement of the. topic! 

The relegation of statistics to a few pages in the appendix will 
be disappointing to many. The reviewer doubts the value of such a 
brief summary to the beginning student and to the instructor who may 
“wish to make the teaching of a few fundamentals of statistics a part 
of the experimental course.” 
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Many will also deplore the complete absence of material in the 
fields of individual differences, intelligence testing and aptitude meas. 
urement. Experiments along these lines are becoming increasingly 
popular in many places, and beginning students appear to derive more 
benefits from the practical procedures of assessing their own capabilities 
and potentialities than from many of the traditional and outmoded 
problems. 

However much one may feel the lack of perfection in some depart- 
ments, he must certainly recognize in this book an excellent review of 
the fields of learning, memory, work, and fatigue, and above all a 
worthy example of the much needed experimental text—a supplement 
to the laboratory manual. E. Donatp Sisson. 

University of Minnesota. 


PrercivaAL M. Symonps. Mental Hygiene of the School Child. New 
York: The Macmillan Company, 1934, pp. 321. 


The mental hygiene movement which started with an interest in 
mental diseases and developed interests in social problems such as 
delinquency has, for some years, been interested in normal children. 
However, in mental hygiene literature the emphasis quite often is on 
the etiology of abnormal behavior, and words connoting abnormal 
are not infrequently used. This frequency of referral to things 
abnormal in mental hygiene literature has all too often been responsible 
for teachers disregarding the literature. The general assumption 
sometimes is that their concern is with normal children not abnormal 
ones. To serve these teachers, books which contain insights in mental 
hygiene but which do not refer so frequently to abnormal behavior 
are needed. Symonds’ Mental Hygiene of the School Child is such a 
book. 

The book is written in relatively simple language, questions 
follow every chapter, frequent illustrations of points made are given, 
and all in all the content and style are of a nature that should appeal 
to teachers. Unlike many other books in mental hygiene there are 
included in this one a chapter on learning, a chapter on drives as well 
as a chapter on behavior mechanisms. Some of the behavior patterns 
briefly dealt with ah ae to be doing something other than the 
job at hand, restlessness in study and easy distraction, tardiness, 


exhibition of anxiety over mistakes, tendency to play for the center 
of the stage, tendency to make alibis for failures, tendency to solitari- 
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ness, and exaggerated courtesy. There are fairly adequate chapters 
on positive habits of mental hygiene as well as negative habits to be 
avoided, and sex adjustments. In these chapters the subjects are not 
dealt with in any detailed manner but enough is there to arouse the 
interest of the teacher and to point to a healthier point of view than 
is generally prevalent amongst teachers. Other topics dealt with 
in the book include the role of the teacher in mental hygiene, discipline, 
school organization, psychological services in the school, interviewing 
and the case study, remedial work, and the adjustment of the teacher. 
The last chapter is devoted to a presentation of selected case studies. 
Justifiably the book may be said to be, in spots at least, over- 
simplified and in other spots over-academic. Portions of it might be 
said to suggest a lack of insights that go with clinical experiences, but 
all in all this book is a useful addition to the mental hygiene literature 
that is likely to interest school teachers. H. MELTzER. 
Psychological Service Center, St. Louis. 


N. P. Neruson anp F. W. Cozens. Achievement Scales in Physical 
Education Activities. New York: A. S. Barnes & Company, 
1934, pp. X + 171. 


Upon the basis of the test results of more than seventy-nine 
thousand California boys and girls in school grades five to nine the 
authors have constructed achievement scales for thirty-three physical 
education activities. The scales are constructed in terms of a classi- 
fication which combines the three factors of age, height, and weight. 
The selection of these factors was determined by their ease of rapid 
measurement in the school situations, the pupil’s inability to “‘ quickly 
and wilfully modify” them, and their apparent influence “on per- 
formance in specific activities.” 

In addition to the scales, very careful descriptions of the tests 
are presented, along with precise instructions for giving and scoring 
them. Sample classification charts, age computing charts, and individ- 
ual and group scoring records are also included, thus making the 
use of the scales a relatively simple matter for the physical education 
instructor. A table of the position of the achievement scores in a 
normal distribution is appended. 

The present reviewer is not qualified to comment upon the signifi- 
cance of the tests themselves as a measure of physical development or 
skill, but he is impressed by the scope of the scales and the scientific 
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care with which they have been designed and presented. The author’s 
comments on the value of competition are psychologically sound. 

The field of physical education has, perhaps, been somewhat 
neglected in the development of objective methods of educational 
measurement. It would seem that this book fulfills a real need and 
goes far toward accomplishing the purposes enumerated by the 
authors: 

“1. To stimulate pupils to have an interest in all-round physical 
development; 

“2. To interest pupils in their play through a fair evaluation of 
their efforts; 

“3. To supplement the routine physical examination by finding 
pupils’ strengths, weaknesses, and skill status so that an activity 
program may be adapted to their needs; 

‘4. To measure pupils’ improvement in skills; and 

“5. To aid in further research and experimentation in the physical 
education field.’ CARLETON F. ScoFIe.p. 

University of Buffalo. 
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